PP1
</)
(/)
LU
z
>
(D
Hacking Web
Intelligence
Open Source Intelligence and Web Reconnaissance
Concepts and Techniques
Sudhanshu Chauhan
Nutan Kumar Panda
PR3
Hacking Web
Intelligence
Open Source Intelligence and
Web Reconnaissance Concepts
and Techniques
Sudhanshu Chauhan
Nutan Kumar Panda
ELSEVIER
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Syngress is an imprint of Elsevier
Copyrighted material
PR4
Acquiring Editor: Chris Katsaropoulos
Editorial Project Manager: Benjamin Rearick
Project Manager: Punithavathy Govindaradjane
Designer: Matthew Limbert
Syngress is an imprint of Elsevier
225 Wyman Street, Waltham. MA 02451, USA
Copyright © 2015 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage
and retrieval system, without permission in writing from the publisher. Details on how to
seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright
Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by
the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and
experience broaden our understanding, changes in research methods, professional practices,
or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge
in evaluating and using any information, methods, compounds, or experiments described
herein. In using such information or methods they should be mindful of their own safety and
the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors,
assume any liability for any injury and/or damage to persons or property as a matter of
products liability, negligence or otherwise, or from any use or operation of any methods,
products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-801867-5
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
For Information on all Syngress publications
visit our website at store.elsevier.com/Syngress
ELSEVIER
Working together
to grow libraries in
SSStiSjS developing countries
www.elsevier.com • www.bookaid.org
Copyrighted material
PR5
Contents
Preface.xiii
About the Authors.xv
Acknowledgments.xvii
CHAPTER 1 Foundation: Understanding the Basics.1
Introduction.1
Internet.2
Definition.2
How it works.2
World Wide Web.3
Fundamental differences between internet and WWW.3
Defining the basic terms.3
IP address.3
Port.4
Protocol.4
MAC address.5
E-mail.5
Domain name system.5
URL.6
Server.7
Web search engine.7
Web browser.7
Virtualization.7
Web browsing—behind the scene.8
Lab environment.10
Operating system.10
Programming language.11
Browser.12
CHAPTER 2 Open Source Intelligence and Advanced Social
Media Search.15
Introduction.15
Open source intelligence.16
How we commonly access OSINT.16
Search engines.16
News sites.17
Corporate websites.17
Content sharing websites.17
V
Copyrighted material
Contents
Academic sites.18
Blogs.18
Government sites.19
Web 2.0.19
Social media intelligence.20
Social network.20
Introduction to various social networks.21
Advanced search techniques for some specific social media.25
Faccbook.25
Linkedln.27
Twitter.30
Searching any open social media website.31
Web 3.0.31
CHAPTER 3 Understanding Browsers and Beyond.33
Introduction.33
Browser operations.33
History of browsers.34
Browser architecture.34
User interface.35
Browser engine.35
Rendering engine.35
Networking.35
UI backend.35
JavaScript interpreter.35
Data persistence.35
Error tolerance.36
Threads.36
Browser features.36
Private browsing.36
Autocomplete.37
Proxy setup.38
Raw browsers.38
Why custom versions?.39
Some of the well-known custom browsers.40
Epic.40
HconSTF.40
Mantra.41
FireCAT.43
Oryon C.44
Copyrighted material
PR7
Contents vii
WhiteHat Aviator.44
TOR bundle.45
Custom browser category.45
Pros and cons of each of these browsers.46
Addons.46
Shodan.48
Wappalyzer.48
Buildwith.48
Follow.49
Riffle.49
Who Works.at.50
Onetab.50
SalcsLoft.51
Project Naptha.51
Tineye.51
Reveye.51
Contactmonkey.52
Bookmark.52
Threats posed by browsers.52
CHAPTER 4 Search the Web—Beyond Convention.53
Introduction.53
Meta search.54
People search.56
Business/company search.59
Reverse username/e-mail search.60
Semantic search.62
Social media search.63
Twitter.65
Source code search.66
Technology information.67
Reverse image search.69
Miscellaneous.71
CHAPTER 5 Advanced Web Searching.77
Introduction.77
Google.78
Bing.85
Yahoo.88
Yandex.90
Copyrighted material
viii Contents
CHAPTER 6 OSINT Tools and Techniques.101
Introduction.101
Creepy.102
TheHarvester.105
Shodan.107
Search Diggity.110
Recon-ng.113
Case 1 .119
Case 2.119
Case 3.120
Case 4.120
Yahoo Pipes.121
Maltego.124
Entity.124
Transform.124
Machine.125
Investigate.126
Manage.126
Organize.126
Machines.127
Collaboration.127
Domain to website IP addresses.128
Domain to e-mail address.129
Person to website.130
CHAPTER 7 Metadata.133
Introduction.133
Metadata extraction tools.134
Jeffrey’s Exif Viewer.134
Exif Search.136
ivMeta.137
Hachoir-metadata.138
FOCA.139
Metagoofil.140
Impact.142
Search Diggity.143
Metadata removal/DLP tools.145
MetaShield Protector.145
MAT.145
MyDLP.145
Copyrighted material
X
Contents
Flowcharts.192
Maltego.193
CaseFile.194
MagicTree.196
KeepNote.197
Lumify.198
Xmind.199
CHAPTER 11 Online Security.203
Introduction.203
Malwares.205
Virus.206
Trojan.206
Ransom ware.206
Keylogger.206
Phishing.207
Online scams and frauds.207
Hacking attempts.208
Weak password.208
Shoulder surfing.208
Social engineering.209
Antivirus.209
Identify phishing/scams.210
Update operating system and other applications.210
Addons for security.211
Web of trust (WOT).211
HTTPS Everywhere.212
NoScript.212
Tools for security.212
Password policy.213
Precautions against social engineering.215
Data encryption.215
CHAPTER 12 Basics of Social Networks Analysis.217
Introduction.217
Nodes.217
Edges.218
Network.218
Copyrighted material
Contents
XI
Gephi.218
Overview.218
Data laboratory.219
Preview.220
Node attributes.220
Edge attributes.221
Direction.221
Type.221
Weight.221
Ranking.222
Betweenness.222
CHAPTER 13 Quick and Dirty Python.229
Introduction.229
Programming versus scripting.229
Introduction to Python.230
Installation.230
Modes.230
Hello world program.231
Identifiers.232
Data types.232
Indentation.235
Modules.238
Functions.239
Classes.239
Working with files.240
User input.242
Common mistakes.243
Maltego transforms.245
Resource.251
CHAPTER 14 Case Studies and Examples.253
Introduction.253
Case studies.253
Case study 1: The BlackHat Mashup.253
Case study 2: A demo that changed audience view.255
Case study 3: An epic interview.257
Maltego machines.263
Copyrighted material
XII
Contents
CHAPTER 15 Related Topics of Interest.267
Introduction.267
Cryptography.267
Basic types.267
Data rccovcry/shrcdding.269
Internet Relay Chat.271
Bitcoin.272
Index.275
Copyrighted material
Preface
It was just another day at work, as usual we were supposed to configure some scans,
validate some results, and perform some manual tests. We have been working with
our team on some pentesting projects. Unlike many other jobs pentesting is not that
boring, honestly who doesn’t enjoy finding flaws in someone’s work and get paid for
it. So following the process we did some rccon and found some interesting informa¬
tion about the target. We started digging deeper and soon we had enough information
to compromise the target. We finished the rest of the process and send out the reports
to the clients, who were more than happy with the results.
Later that evening we were discussing about the tests and realized that most of
the information, which allowed us to get a foothold in the target was actually public
information. The target has already revealed too much about itself and it was just a
matter of connecting the dots. It ended here and we almost forgot about it. Another
fine day we were working on some other project and the same thing happened again.
So we decided to document all the tools and techniques we were aware of and create a
shared document, which we both could contribute to. Anytime we encountered some
new method to discover public information, we added it to the document. Soon we
realized that the document has grown too long and we need to categorize and filter it.
Though the topic has been known and utilized in pentesting and red team exer¬
cises widely yet when we tried to find documented work on it, we didn’t find any¬
thing substantial. This is where we started thinking of converting our document into
a book.
While researching about the topic we understood that there is too much public
information, which is easily accessible. Most of it might not seem very useful at first
glance but once collected and correlated, it can bring phenomenal results. We also
realized that it is not just pentesting where it is of prime importance to collect infor¬
mation about the target, but there are many other professions, which utilize simi¬
lar methods, such as sales reps find information about prospective client, marketing
professionals collect information related to market and competition. Keeping that
in mind we have tried to keep the tone and flow of the book easy to follow, without
compromising on the technical details. The book moves from defining the basics to
learning more about the tools we are already familiar with and finally toward more
technical stuff.
WHAT THIS BOOK COVERS
Hacking Web Intelligence has been divided into different sections according to the
complexity and mutual dependency. The first few chapters are about the basics and
dive deep into topics most of us are already familiar with. The middle section talks
XIV
Preface
about the advanced tools and techniques and in the later portion we will talk about
actually utilizing and implementing what we discuss in previous sections.
While following the book it is suggested to not just read it but practice it. The
examples and illustrations are included to understand how things work and what to
expect as a result. It is not just about using a tool but also understanding how it does
so as well as what to do with the information collected. Most of the tools will be able
to collect information but to complete the picture we need to connect these dots. On
the other hand like any tool, the ones we will be using might be updated, modified,
or even depreciated and new once might show up with different functionality, so stay
updated.
HOW DO YOU PRACTICE
A desktop/laptop with any operating system. Different browsers such as Mozilla
Firefox, Chrome or Chromium, and internet connectivity. Readers will be assisted to
download and install tools and dependencies based on the requirement of the chapter.
TO WHOM THIS BOOK IS FOR
The book would focus mainly on professionals related to information security/intel¬
ligence/risk management/consulting but unlike “from Hackers to the Hackers” books
it would also be helpful and understandable to laymen who require information gath¬
ering as a part of their daily job such as marketing, sales, journalism, etc.
The book can be used in any intermediate level information security course for
reconnaissance phase of the security assessment.
We hope that as a reader you learn something new which you could practice in
your daily life to make it easier and more fruitful like we did while creating it.
Sudhanshu Chatihan
Principal Consultant, Noida, India
Nutan Kumar Panda
Information Security Engineer, Bangalore, India
Copyrighted material
PR15
About the Authors
SUDHANSHU CHAUHAN
Sudhanshu Chauhan is an information security professional and OSINT specialist.
He has worked in the information security industry, previously as senior security
analyst at iViZ and currently as director and principal consultant at Octogence Tech
Solutions, a penetration testing consultancy. He previously worked at the National
Informatics Center in New Delhi developing web applications to prevent threats. He
has a BTech (CSE) from Amity School of Engineering and Diploma in cyber secu¬
rity. He has been listed in various Hall of Fame such as Adobe, Barracuda, Yandex,
and Freelancer. Sudhanshu has also written various articles on a wide range of topics
including Cyber Threats, Vulnerability Assessment, Honeypots, and Metadata.
NUTAN KUMAR PANDA
An information security professional with expertise in the field of application and
network security. He has completed his BTech (IT) and has also earned various pres¬
tigious certifications in his domain such as CEH, CCNA, etc. Apart from performing
security assessments he has also been involved in conducting/imparting informa¬
tion security training. He has been listed in various prestigious Hall of Fame such
as Google, Microsoft, Yandex, etc. and has also written various articles/technical
papers. Currently he is working as Information Security Engineer at eBay Inc.
XV
Copyrighted material
PR16
This page intentionally left blank
Copyrighted material
PR17
Acknowledgments
SUDHANSHU CHAUHAN
I would like to dedicate this book to my family, my friends, and the whole security
community, which is so open in sharing knowledge. Few people 1 would like to
name who have encouraged and motivated me through this journey are Shubham,
Chandan, Sourav da, and especially my mother Kanchan.
NUTAN KUMAR PANDA
I would like to dedicate this book to my parents and my lovely sister for believing in
me and encouraging me. My friend, well-wisher, and my coauthor Sudhanshu for all
the help and support during this book writing process, and last but not the least all my
friends, colleagues, and specially Somnath da and the members of Null: The Open
Security Community for always being there and giving their valuable suggestions in
this process. Love you all.
xvii
Copyrighted material
PR18
This page intentionally left blank
Copyrighted material
PA1
Foundation:
Understanding the
Basics
1
INFORMATION IN THIS CHAPTER
• Information overload
• What is internet
• How it works
• What is World Wide Web
• Basic underlying technologies
• Environment
INTRODUCTION
Information Age. The period of human evolution in which we all are growing up.
Today internet is an integral part of our life. Wc all have started living dual life; one
is our physical life and the other is the online one, where we exist as a virtual entity.
In this virtual life we have different usernames, aliases, profile pictures, and what not
in different places. We share our information intentionally and sometimes uninten¬
tionally in this virtual world of ours. If we ask ourselves how many websites we’re
registered on, most probably we won’t be able to answer that question with an exact
number. The definition of being social is changing from meeting people in person to
doing Google hangout and being online on different social networking sites. In the
current situation it seems that technology is evolving so fast that we need to cope up
with its pace.
The evolution of computation power is very rapid. From an era of limited
amount of data we have reached to the times where there is information overload.
Today technologies like Big data, Cloud computing are the buzzwords of the IT
industry, both of which deal with handling huge amount of data. This evolution
certainly has its pros as well as cons, from data extraction point of view we need to
understand both and evaluate how we can utilize them to our advantage ethically.
The main obstacle in this path is not the deficiency of information but surprisingly
the abundance of it present at the touch of our fingertips. At this stage what we
require is relevant and efficient ways to extract actionable intelligence from this
enormous data ocean.
Hacking Web Intelligence. http://dx.doi.Org/10.1016/B978-0.12-801867.5.ft0001-X \
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
2
CHAPTER 1 Foundation: Understanding the Basics
Extracting the data which could lead toward a fruitful result is like looking for a
needle in a haystack. Though sometimes the information which could play a game
changing role is present openly and free to access, yet if we don’t know how to find it
in a timely fashion or worse that it even exists it would waste a huge amount of critical
resources. During the course of this book we will be dealing with practical tools and
techniques which would not only help us to find information in a timely manner but
also help us to analyze such information for better decision making. This could make a
huge difference for the people dealing with such information as a part of their daily job,
such as pentesters, due diligence analysts, competitive intelligence professionals, etc.
Let’s straightaway jump in and understand the internet we all have been using
for so long.
INTERNET
Internet, as we know it has evolved from a project funded by DARPA within the
US Department of Defense. The initial network was used to connect universities
and research labs within the US. This phenomenon slowly developed worldwide and
today it has taken the shape of the giant network which allows us to connect with the
whole world within seconds.
DEFINITION
Simply said the internet is a global network of the interlinked computers using dedi¬
cated routers and servers, which allows its end users to access the data scattered all
over the world. These interconnected computers follow a particular set of rules to
communicate, in this case IP or internet protocol (IP) for transmitting data.
HOW IT WORKS
If you bought this book and are reading it then you must be already knowing how
internet works, but still it’s our duty to brush up some basics, not deeply though.
As stated above, internet is a global network of interconnected computers and lots
of devices collaboratively make the internet work, for example, routers, servers,
switches with other hardware like cables, antennas, etc. All these devices together
create the network of networks, over which all the data transmission takes place.
As in any communications you must have end points, medium, and rules. Inter¬
net also works around with these concepts. End points are like PC, laptop, tablet,
smartphone, or any other device a user uses. Medium or nodes are the different dedi¬
cated servers and routers connected to each other and protocols are sets of rules that
machines follow to complete tasks such as transmission control protocol (TCP)/1P.
Some of the modes of transmission of data are telephone cables, optical fiber, radio
waves, etc.
Copyrighted material
PA3
Defining the basic terms 3
WORLD WIDE WEB
World Wide Web (WWW) or simply known as the web is a subset of internet or in
simple words it’s just a part of the internet. The WWW consists of all the public web¬
sites connected to the internet, including the client devices that access them.
It is basically a structure which consists of interlinked documents and is rep¬
resented in the form of web pages. These web pages can contain different media
types such as plain text, images, videos, etc. and are accessed using a client appli¬
cation, usually a web browser. It consists of a huge number of such interconnected
pages.
FUNDAMENTAL DIFFERENCES BETWEEN INTERNET AND WWW
For most of us the web is synonymous to internet, though it contributes to the internet
yet it is still a part of it. Internet is the parent class of the WWW. In the web, the infor¬
mation and documents are linked by the website uniform resource locators (URLs)
and the hyperlinks. They are accessed by browser of any end device such as PC or
smartphone using hypertext transfer protocol (http) and nowadays generally using
https. HTTP is one of the different protocols that are being used in internet such as
file transfer protocol (FTP), simple mail transfer protocol (SMTP), etc. which will
be discussed later.
So now as we understand the basics of internet and web, we can move ahead
and learn about some of the basic tcrminologics/tcchnologics which we will be fre¬
quently using during the course of this book.
DEFINING THE BASIC TERMS
IP ADDRESS
Anyone who has ever used a computer must have heard about the term IP address.
Though some of us might not understand the technical details behind, yet we all know
it is something associated with the computer address. In simple words, IP address is
the virtual address of a computer or a network device that uniquely identifies that
device in a network. If our device is connected to a network we can easily find out
the device IP address. In case of Windows user it can simply be done by opening the
command prompt and typing a command “ipconlig”. It’s near about the same for a
Linux and Mac user. We have to open the terminal and type “ifconfig” to find out the
IP address associated with the system.
IP address is also known as the logical address and is not permanent. The IP
address scheme popularly used is IPv4, though the newer version IPv6 is soon catch¬
ing up. It is represented in dotted decimal number. For example, “192.168.0.1”.
It starts from 0.0.0.0 to 255.255.255.255. When we try to find out the IP address
Copyrighted material
4
CHAPTER 1 Foundation: Understanding the Basics
associated with our system using any of the methods mentioned above, then we will
find that the address will be within the range mentioned above.
Broadly IP address is of two types
1. Private IP address
2. Public IP address
Private IP address is something that is used to uniquely identify a device in a local
area network, let’s say how our system is unique in our office from other systems.
There are sets of IP address that is only used for private IP addressing:
10.0.0.0-10.255.255.255
172.16.0.0-172.31.255.255
192.168.0.0-192.168.255.255
The above mentioned procedure can be used to check our private IP address.
Public IP address is an address which uniquely identifies a system in internet.
It’s generally provided by the Internet Service Provider (ISP). We can only check
this when our system is connected to the internet. The address can be anything other
than private IP address range. We can check it in our system (despite of any OS) by
browsing “whatismyipaddress.com”.
PORT
We all arc aware of ports like USB port, audio port, etc. but here we are not talk¬
ing about hardware ports, what we are talking about is a logical port. In simple
words, ports can be defined as a communication point. Earlier we discussed how
IP address uniquely identifies a system in a network and when a port number is
added to the IP address then it completes the destination address to communicate
with the destination IP address system using the protocol associated with the pro¬
vided port number. We will soon discuss about protocol, but for the time being
let’s assume protocol is a set of rules followed by all communicating parties for
the data exchange. Let’s assume a website is running on a system with IP address
“192.168.0.2” and we want to communicate with that server from another system
connected in the same network with IP address “192.168.0.3”. So we just have to
open the browser and type “192.168.0.2:80” where “80” is the port number used
for communication which is generally associated with http protocol. Ports are gen¬
erally application specific or process specific. Port numbers are within the range
0-65535.
PROTOCOL
Protocol is a standard set of regulations and requirements used in a communication
between source and destination system. It specifies how to connect and exchange
data with one another. Simply stated, it is a set of rules being followed for communi¬
cation between two entities over a medium.
Copyrighted material
PA5
Defining the basic terms 5
Some popular protocols and their associated port numbers:
• 20, 21 FTP (File Transfer Protocol): Used for file transfer
• 22 SSH (Secure Shell): Used for secure data communication with another machine.
• 23 Telnet (Telecommunication network): Used for data communication with another machine.
• 25 SMTP (Simple Mail Transfer Protocol): Used for the management of e-mails.
• 80 HTTP (Ilyper Text Transfer Protocol): Used to transfer hypertext data (web).
MAC ADDRESS
MAC address is also known as physical address. MAC address or media access con¬
trol address is a unique value assigned to the network interlace by the manufacturer.
Network interface is the interface used to connect the network cable. It’s represented
by hexadecimal number. For example, “00:A2:BA:C1:2B: 1C”. Where the first three
sets of hexadecimal character is the manufacturer number and rest is the serial num¬
ber. Now let’s find MAC address of our system.
In case of Windows user it can simply be done by opening the command prompt
and typing a command either “ipconfig/all” or “getmac”. It’s near about the same for
a Linux and Mac user. We have to open the terminal and type “ifeonfig-a” to find out
the MAC address associated with the system. Now let’s note down the MAC address/
physical address of our network interface of our system and find out the manufac¬
turer name. Search for the first three sets of hexadecimal character in Google to get
the manufacturer name.
E-MAIL
E-mail is the abbreviation of electronic mail, one of the widely used technol¬
ogy for digital communication. It’s just one click solution for exchanging digital
message from sender to receiver. A general structure of email address is “user-
name@domainname.com”. The first part which comes prior to @ symbol is the
username of any user who registered himself/herself for using that e-mail service.
The second part post @ symbol is the domain name of the mail service provider.
Apart from all these, nowadays every organization which have website registered
with a domain name also creates mail service to use. So if we work in a company
with domain name “xyz.com” our company e-mail id must be “ourusername®
xyz.com”. Some popular e-mail providers are Google, Yahoo, Rediff, AOL, and
Outlook, etc.
DOMAIN NAME SYSTEM
Domain name system (DNS) as the name suggests is a naming system for the
resources connected to the internet. It maintains a hierarchical structure of this nam¬
ing scheme through a channel of various DNS servers scattered over the internet.
For example, let’s take googlc.com it’s a domain name of Google Inc. Google
has its servers present in different locations and different servers are uniquely
assigned with different IP addresses. It is different for a person to remember all
Copyrighted material
6
CHAPTER 1 Foundation: Understanding the Basics
the IP address of different servers he/she wants to connect, so there comes DNS
allowing a user to remember just the name instead of all those IP address. In this
example we can easily divide the domain name into two parts. First part is the name
generally associated with the organization name or purpose for which domain is
bought as here Google is the organization name in google.com. The second part
or the suffix part explains about the type of the domain such as here “com” is used
for commercial or business purpose domain. These suffixes are also known as top
level domains (TLDs).
SOME EXAMPLES OF TLDS:
• net: network organization
• org: non-profit organization
• edu: educational institutions
• gov: government agencies
• mil: military purpose
One of the other popular suffix class is country code top level domain (ccTLD). Some examples are:
• in: India
• us: United States
• uk: United Kingdom
DNS is an integral part of the internet as it acts as yellow pages for it. We simply
need to remember the resource name and the DNS will resolve it into a virtual address
which can be easily accessed on the internet. For example, google.com resolves to
the IP address 74.125.236.137 for a specific region on the internet.
URL
A URL or uniform resource locator can simply be understood as an address used to
access the web resources. It is basically a web address.
For example, http://www.example.com/test.jpg. This can be divided into five
parts, which are:
1. http
2. www
3. example
4. com
5. Aest.jpg
The first part specifies the protocol used for communication, and in this case it is
HTTP. But for some other case other protocols can also be used such as https or ftp.
The second part is used to specify whether the URL used is for the main domain or a
subdomain, www is generally used for main domain, some popular subdomains are
blog, mail, career, etc. The third part and forth part are associated with the domain
Copyrighted material
PA7
Defining the basic terms 7
name and type of domain name which we just came across in DNS part. The last part
specifies a file “test.jpg” which need to be accessed.
SERVER
A server is a computer program which provides a specific type of service to other
programs. These other programs, known as clients can be running on the same sys¬
tem or in the same network. There are various kinds of servers and have different
hardware requirements depending upon the factors like number of clients, band¬
width, etc. Some of the kinds of server are:
Web server: Used for serving websites.
E-mail server: Used for hosting and managing e-mails
File server: Used to host and manage file distribution
WEB SEARCH ENGINE
A web search engine is a software application which crawls the web to index it and
provides the information based on the user search query. Some search engines go
beyond that and also extract information from various open databases. Usually the
search engines provide real-time results based upon the backend crawling and data
analysis algorithm they use. The results of a search engine are usually represented in
the form of URLs with an abstract.
Apart from usual web search engines, some search engines also index data from
various forums, and other closed portals (require login). Some search engines also collect
search results from various different search engines and provide it in a single interface.
WEB BROWSER
A web browser is a client-side application which provided the end user the capability
to interact with the web. A browser contains an address bar, where the user needs to
enter the web address (URL), this request is further sent to the destination server and
the contents are displayed within the browser interface. The response for the request
sent by client contains of raw data with associated format for the data.
Earlier browsers had limited functionality, but nowadays with various features
such as downloading content, bookmarking resources, saving credentials, etc. and
new add-ons coming up every day, browsers are becoming very powerful. The advent
of cloud-based applications has also hugely contributed in making browsers the most
widely used software.
VIRTUALIZATION
Virtualization can be described as the technique of abstracting physical resources,
with the aim of simplification and utilization of the resources with ease. It can consist
Copyrighted material
8
CHAPTER 1 Foundation: Understanding the Basics
of anything from a hardware platform to a storage device or OS, etc. Some of the
classifications of virtualization are:
Hardware/platform: Creation of a virtual machine that performs like an original
computer with an OS. The machine on which the virtualization takes place is
the host machine and the virtual machine is the guest machine.
Desktop: Concept of separating the logical desktop from the physical
machine. The user interacts with the host machine over a network using
another device.
Software: OS level virtualization can be described as hosting of multiple virtu¬
alization environments within a single OS instance. Application virtualization is
hosting of individual applications in an environment separated from the underly¬
ing OS. In service virtualization the behavior of dependent system component is
emulated.
Network: Creation of a virtualized network addressing space within or across
network subnets.
WEB BROWSING—BEHIND THE SCENE
So now as we have put some light on some of the technological keywords that we will
be dealing with in later chapters, let’s dive a little deeper and try to understand what
exactly happens when we try to browse a website. When we enter a URL in a browser
it divides the same into two parts. Let’s say we entered “http://www.example.com”.
The two parts of this URL will be (1) http and (2) example.com. The reason fordoing
so is that to identify the protocol used and domain name to resolve it to an IP address.
Let’s again assume that the IP address associated with the domain name example,
com is “192.168.1.4” then browser will process it as “192.168.1.4:80” as 80 is the
port number associated with protocol HTTP.
From paragraph which contains details about DNS we already came across that
it is used to resolve the domain name into IP address but how? It depends whether
we are visiting a site for first time or we often visit this site. But still for both the
case the procedure remains quite same. First DNS lookup starts with browser cache
to check if there is some records present or not or checks whether we visited this
site earlier or this is the first time. If the browser cache does not contain any infor¬
mation the browser does a system call to check whether OS is having any DNS
record in its cache or not. Similarly if not found then it searches the same DNS
info in router cache if not found the ISP DNS cache then finally if not found any
DNS record in these places starts a recursive search from root name server to top
level name servers to resolve the domain name. The thing which we need to think
about is that some domain names are associated with multiple IP addresses such
as google.com in that case also it returns with only one IP address based on the
geographic location of the user who intent to use that resource. The technique is
also known as geographic DNS.
Copyrighted material
Web browsing—behind the scene
9
In above paragraph we understood how DNS lookup searches for information
from browser cache but that is only for sites which are static, because dynamic sites
contains dynamic contents that expires quickly. However, the process is quite same
for both the cases.
After DNS resolution, browser opens a TCP connection to the server and sends a
hypertext request based on the protocol mentioned in the URL as it is HTTP in our
case browser will send an HTTP GET request to the server through TCP connection.
Then browser will receive an HTTP response from the server with status code. In
simple words, status codes define the server status for the request. There are different
types of status codes, but that is a huge topic on its own; hence just for our under¬
standing I will include some of the popular status codes that a user might encounter
in browsing
User inputs a URL
I http://www.example.com/
i
DNS Resolution
process
Port: 80 H IP; 192.168.1.4
XT 7
• Receive HTTP response from server
FIGURE 1.1
Web browsing—behind the scene.
Copyrighted material
10 CHAPTER 1 Foundation: Understanding the Basics
HTTP STATUS CODE CLASSES
They lie between 100 and 505 and arc categorized as different classes according to its first number.
• lxx: Informational
• 2xx: Successful
• 3xx: Redirection
• 4xx: Client-error
• 5xx: Server-error
Some popular status codes:
• 100: continue
• 200:ok
• 301: moved permanently
• 302: found
• 400: bad request
• 401: unauthorized
• 403: forbidden
• 404: not found
• 500: internal server error
• 502: bad gateway
If browser gets any error status code then it fails to get the resources properly if
not then it renders the response body. The response body generally contains html
codes for the page contents and links to other resources, which further undergo the
same process. If the response page is cacheable it will be stored in cache. This is the
overall process takes place in the background when we try to browse something in
internet using a browser.
LAB ENVIRONMENT
As we have discussed the basic concepts, now let’s move ahead and understand the
environment for our future engagements.
OPERATING SYSTEM
For a computer system to run we need basic hardware such as motherboard, RAM,
hard disc, etc. but hardware is worthless until there is an OS to run over it. An oper¬
ating system basically is a collection of software which can manage the underlying
hardware and provide basic services to the users.
Windows
One of the most widely used OS introduced by Microsoft in 1985. After so many
years it has evolved to a very mature stage. The current version is Windows 8.1.
Though it has had its fair share of criticism yet it holds a major percentage of the
market share. The ease of usability is one of the major features of this OS which
makes it widely acceptable.
Copyrighted material
PA11
Lab environment 11
Though during the writing of this book we were using Windows 7 64 bit, any
version above 7 will also be fine and would function more or less in similar fashion.
Linux
Popular as the OS of geeks, this OS is available in many flavors. Mostly it is used for
servers due to the stability and security it provides, but is also popular among develop¬
ers, system admins, security professionals, etc. Though it surely seems a bit different
as well as difficult to use for an average user, yet today it has evolved to a level where
the graphical user interface (GUI) provided by some of its flavors are at par with
Windows and Mac interfaces. The power of this OS lies in its terminal (command line
interface), which allows to utilize all the functionality provided by the system.
We will be using Kali Linux (http://www.kali.org/), a penetration testing distribu¬
tion during this book. It is based on Debian, which is a well-known, stable flavor of
Linux. Though, other flavors such as Ubuntu, Arch Linux, etc. can also be used as
most of the commands will be similar.
Mac
Developed by Apple this series of OS is well known for its distinctively sleek design.
In the past it has faced criticism due to the limited options available at software front,
but as of today there is a wide range of options available. It is said to be more secure
as compared to its counterparts (in average use domain), yet it has faced some severe
security issues.
Mac provides a powerful command line interface (GUI) as well as CLI which
makes it a good choice for any computing operation. Though we were using Mac OS
X 10.8.2 during the writing of this book, any later version will also be fine for practice.
Most of the tools which will be used during the course of this book will be free/
open source and also platform independent, though there will be some exceptions
which will be pointed out as and when they come into play. It is recommended to have
a virtual machine of a different OS type (discussed above) apart from the base system.
To create a virtual machine we can use the virtualization software such as
VirtualBox or VMware Player. Oracle VirutalBox can be downloaded from https://
www.virtualbox.org/wiki/Downloads. VMware Player can be downloaded from
http://www.vmware.com/go/downloadplayer/.
PROGRAMMING LANGUAGE
A programming language is basically a set of instructions which allows to commu¬
nicate commands to a computing machine. Using a programming language we can
control the behavior of a machine and automate processes.
Java
Java is a high-level, object-oriented programming language developed by Sun Micro¬
systems, now Oracle. Due to the stability provided by it, it is heavily used to develop
Copyrighted material
12 CHAPTER 1 Foundation: Understanding the Basics
applications following clicnt-servcr architecture. It is one of the most popular pro¬
gramming language as of today.
Java is required to run many browser-based as well as other applications and runs
on a variety of platforms such as Windows, Linux, and Mac.
The latest version of Java can be downloaded from: https://www.java.com/
en/download/manual.jsp
Python
A high-level programming language, which is often used for creating small and effi¬
cient scripts. It is also used widely for web development. Python follows the philoso¬
phy of code readability, which means indentation is an integral part of it.
The huge amount of community support and availability of third party libraries
makes it the preferable language of choice for most of the people who frequently
need to automate small tasks. Though this does not mean that Python is not powerful
enough to create full-fledged applications and Django, a Python-based web frame¬
work is a concrete example of that. We will discuss Python programming in detail
in later chapter.
The current version of Python is 3.4.0, though we will be using the version 2.7 as
3.x series has had some major changes and it is not backward compatible. Most of the
scripts we will be using/writing will be using the 2.7 version. It can be downloaded
from https://www.python.Org/download/releases/2.7.6/
BROWSER
As discussed above, a browser is a software application which is installed at the
client’s end and allows to interact with the web.
Chrome
Developed by Google and it is one of the most widely used browser. First released
in 2008, today this browser has evolved to a very stable release and has left the com¬
petition way behind. Most of its base code is available online in form of Chromium
(http://www.chromium.org/Homc).
Today Chrome is available for almost all devices which are used for web surfing,
be it a laptop, a tablet, or a smartphone. The ease of usability, stability, security, and
add-on features provided by Chrome clearly makes it one of the best browsers avail¬
able. It can be downloaded from https://www.google.com/intl/en/chrome/browser/.
Firefox
Firefox is another free web browser and is developed by Mozilla Foundation. The
customization provided by Firefox allows to modify it to your desire. One of the
greatest features of Firefox is the huge list of browser add-ons, which allows to tailor
it for specific requirements. Similar to Chrome it is available for various platforms. It
can be downloaded from https://www.mozilla.org/en-US/firefox/all/.
Copyrighted material
Lab environment 13
In this book we will mainly be using Chrome and Firefox as our browsers of
choice. In a later chapter we will be customizing both to suit our needs and will also
try out some already modified versions.
So in this chapter we have understood the basic technologies as well as the envi¬
ronment we will be using. The main motivation behind this chapter is to build the
foundation so that once we are deep into our main agenda i.e., web intelligence, we
have a clear understanding of what we are dealing with. The basic setup we have sug¬
gested is very generic and easy to create. It does not require too much installations
at the initial stage, the tools which will be used later will be described as they will
come into play. In the forthcoming chapter we will be diving deep into the details of
Open source intelligence.
Copyrighted material
PA 14
This page intentionally left blank
Copyrighted material
PA15
CHAPTER
Open Source Intelligence
and Advanced Social
Media Search
INFORMATION IN THIS CHAPTER
• Open source intelligence (OSINT)
• Web 2.0
• Social media intelligence (SOCMINT)
• Advanced social media search
• Web 3.0
INTRODUCTION
As wc already covered the basic yet essential terms with little details in the previous
chapter, it’s time to move on to understanding the core topic of this book, that is open
source intelligence also known by its acronym OSINT, but before that we need to rec¬
ognize how we see the information available in public and up to what extent we see it.
For most of us internet is limited to the results of the search engine of our choice.
If we talk about a normal user who wants some information from the internet he/
she directly goes to a search engine; let’s assume it’s one of the most popular search
engine Google and puts a simple search query. A normal user unaware of advanced
search mechanisms provided by Google or its counterparts puts simple queries he/
she feels comfortable with and gets a result out of it. Sometime it becomes difficult
to get the information from search engine due to poor formation of the input queries.
For example, if a user wants to search for a windows blue screen error troubleshoot ,
he/she generally enters in the search engine query bar “my laptop screen is gone blue
how to fix this,” now this query might or might not be able to get the desired result
in the first page of the search engine, which can be a bit annoying at times. It’s quite
easy to get the desired information from the internet, but we need to know from where
and how to collect that information, efficiently. A common misconception among
users is that the search engine that he/she prefers has whole internet inside it, but in
real scenario the search engines like Google have only a minor portion of the internet
indexed. Another common practice is that people don’t go to the results on page two
of a search engine. Wc all have heard the joke made on this that “if you want to hide a
dead body then Google results page two is the safest place.” So we want all our read¬
ers to clear their mind if they also think the same way, before proceeding to the topic.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-801867-5.ft0002-l J 5
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
16 CHAPTER 2 OSINT and Advanced Social Media Search
OPEN SOURCE INTELLIGENCE
Simply stated, open source intelligence (OSINT) is the intelligence collected from
the sources which are present openly in the public. As opposed to most other intelli¬
gence collection methods, this form does not utilize information which is covert and
hence does not require the same level of stealth in the process (though some stealth
is required sometimes).
OSINT comprises of various public sources, such as:
• Academic publications: research papers, conference publications, etc.
• Media sources: newspaper, radio channels, television, etc.
• Web content: websites, social media, etc.
• Public data: open government documents, public companies announcements, etc.
Some people don’t give much heed to this, yet it has proven its importance time
and again. Most of the time it is very helpful in providing a context to the intel¬
ligence provided from other modes but that’s not all, in many scenarios it has been
able to provide intelligence which can directly be used to make a strategic decision.
It is thought to be one of the simplest and easiest modes by many if not most, yet it
does has its difficulties; one of the biggest and unique out of all is the abundance of
data. Where other forms of intelligence starve for data, OSINT has so much data that
filtering it out and converting it into an actionable form is the most challenging part.
OSINT has been used for long time by government, military as well as the cor¬
porate world to keep an eye on the competition and to have a competitive advantage
over them.
As we discussed, for OSINT there are various different public sources from which
we can collect intelligence, but during the course of this book we will be focusing
on the part which only uses internet as its medium. This specific type of OSINT is
called as WEBINT by many, though it seems a bit ambiguous as there is a differ¬
ence between the internet and web (discussed in Chapter 1). It might look like that
by focusing on a specific type we are missing a huge part of OSINT, which would
have been correct few decades earlier but today where most of the data are digitized
this line of difference is slowly thinning. So for the sake of understanding we will be
using the terms WEBINT and OSINT interchangeably during this book.
HOW WE COMMONLY ACCESS OSINT
SEARCH ENGINES
Search engines are one of the most common and easy method of utilizing OSINT.
Every day we make hundreds of search queries in one or more search engines,
depending upon our preference and use the search results for some purpose. Though
Copyrighted material
PA17
How we commonly access OSINT
the results we get seem simple but there is a lot of backend indexing goes on based
on complex algorithms. The way we create our queries make a huge difference in the
accuracy of the result that we actually seek from a search engine. In a later chapter
we will discuss how to craft our queries so that we can precisely get the result that
we desire. Google, Yahoo, and Bing are well-known examples of the search engines.
Though it seems like search engines have lots of information, yet they only index
the data which they are able to crawl through programs known as spiders or robots.
The part of the web these spiders are able to crawl is called as the surface web, the
rest of the part is called as the dark web or darknet. This darknet is not indexed as it
is not directly accessible via a link. Example of darknet is a page generated dynami¬
cally using the search option on a web page. We will discuss about darknet and asso¬
ciate terms in a later chapter.
NEWS SITES
Earlier the popular mediums of news were newspaper, radio, and television; but the
advancement in the internet technology has drastically changed the scenario and
today every major news vendor has a website where we can get all the news in a
digital format. Today there even exist news agencies which only run online. This
advancement has certainly brought news at the touch of our fingertips at anytime,
anywhere where there is an internet connection available. For example, bbc.com is
the news website for the well-known British Broadcasting Corporation.
Apart from news vendors, there are sites run by individuals or a group as well
and some of them focus on topics which belong to specific categories. These sites
are mainly present in form of blogs, online groups, forums, or IRCs (Internet Relay
Chat), etc. and are very helpful when we need the opinion of the mass on a specific
topic.
CORPORATE WEBSITES
Every major corporation today runs a website. It’s not just a way to present your exis¬
tence but also interact directly with customers, understand their behavior, and much
more. For example, www.gm.com is the corporate website for General Motors. We can
find out a plethora of information about a company from its website. Usually a corporate
website contains information like key players in the organization, their e-mails, com¬
pany address, company telephone, etc. which can be used to extract further information.
Today some of the corporate websites also provide information in the form of
White Papers, Research Papers, corporate blogs, newsletters subscription, current
clients, etc. This information is very helpful in understanding not only the current
state of the company but also its future plans and growth.
CONTENT SHARING WEBSITES
Though there are various types of user-generated content out there which contains
an amalgam of text as well as various different multimedia files, yet there are some
17
Copyrighted material
18 CHAPTER 2 OSINT and Advanced Social Media Search
sites which allows us to share a specific type of content such as videos, photo, art, etc.
These types of sites are very helpful when we need a specific type of media related to
a topic as we know exactly where to find it. YouTube and Flickr are good examples
of such sites.
ACADEMIC SITES
Academic sites usually contain information to some specific topics, research
papers, future developments, news related to a specific domain, etc. In most cases
this information can be very crucial in understanding the landscape for current
as well as future development. Academic sites are also helpful in learning traits
which arc associated to our field of interest and also understand the correlation in
between.
The information provided in the academic sites is very helpful in understanding
the developments that are taking place in a specific domain and also to get a glimpse
of our future. They are not only helpful in understanding the current state of develop¬
ment but also generating ideas based upon them.
BLOGS
Weblogs or blogs started as a digital form of personal diary, except they are public.
Usually people used to write blogs in a simply way to express their views on some
topics of interest, but this has changed in the past decade. Today there are corporate
blogs, which talk about the views of the company and can reveal a lot about its pur¬
suits; there are blogs on specific topics which can be used to learn about the topic;
there are blogs related to events, etc.
FIGURE 2.1
A blog on bylogger.in.
Copyrighted material
PA19
Web 2.0
Blogs reveal a lot about not just the topic written about, but also about its author.
In many job applications it is desired to have a blog for the applicant as it can be used
to understand his/her basic psychological profile, communication skills, command
over the language, etc.
GOVERNMENT SITES
Government sites contain a huge amount of public data. This includes not just infor¬
mation about the government but also about the people it is serving. Usually there
are government sites which contain information about registered companies, their
directors, and other corporate information; then there are sites which contain infor¬
mation about specific departments of the government; there are also sites where we
can complain regarding public issues and check the status on it; etc.
From geopolitics perspective, government sites can be a good source of informa¬
tion for the development of a country, current advancements, its future plans, etc.
So now this is how we usually interact with the internet today, but it was not
always like this. There were no blogs, no social media, no content sharing, etc. so
how did we got here, let’s see.
WEB 2.0
Earlier websites used to be mainly static, there was not much to interact with. Users
simply used to open the web pages and go through the text and images and that was
pretty much it. Around the late 1990, the web started to take a new form. The static
pages were being replaced by user-generated content. The websites become interac¬
tive, people started to collaborate online. This was the advent of Web 2.0.
Web 2.0 drastically changed the way the web was interacted with. Earlier the
content shared by the webmasters was the only information one could access, now
people could post data on the web, opinions were being shared and challenged. This
changed the way information was generated; now there were multiple sources to con¬
firm or discredit a piece of data. People could share information about themselves,
their connections, their environment, and everything they interacted with.
Now people were not just the viewers of the content of the web but the creators
of it. The feature to interact and collaborate allowed people to create new platforms
for information sharing and connecting around in the virtual reality. Platforms like
content sharing portals, social networks, weblogs, wikis, etc. started to come into
existence. Virtual world slowly started to become our second home and a source of
plethora of information, which would have not existed earlier.
This virtual world is now our reality. The ability to create content here allows us
to share whatever information we want, our personal information, our professional
information, our feelings, our likes/dislikes, and what not. Here we can tell others
about ourselves and at the same time learn about others. We can share our views
about anything and understand how other people perceive those issues. It allows us
to interact with the whole world by sitting in a nook of it.
19
Copyrighted material
20 CHAPTER 2 OSINT and Advanced Social Media Search
Today on these social platforms of ours it’s not just individuals who exist, but
there is much more. There are people in form of communities and/or groups; there
are pages of political parties, corporates, products, etc. Everything we used to deal
with in real life is being replicated in the virtual world. This certainly has brought the
world closer in a sense and it does affect our lives.
The web at its current stage is not only a part of our life but it also influences it.
By sharing our feelings, desires, likes/dislikes online we let others to know about us,
understand our personality, and vice versa. Similarly the content posted online plays
a huge role in our decision making. The advertisements we see online are personal¬
ized and depend upon our online behavior and these ads influence what we buy. Be
it a political hashtag on Twitter or a viral video on YouTube we daily process a lot of
online data and it does make a difference in our decisions.
Today the web has evolved to a level where there is abundance of data, which is a
good thing as it increases the probability of us finding the answers to our questions.
The issue with this is how to extract relevant information out of this mammoth and
this is exactly what we will be dealing with in this book, starting from this chapter.
SOCIAL MEDIA INTELLIGENCE
Social media is an integral part of the web as we know it. It is mostly where all the
user-generated content resides. Social media intelligence or SOCM1NT is the name
given to the intelligence that is collected from social media sites. Some of these may
be open, accessible without any kind of authentication and some might require some
kind of authentication before any information is fetched. Due to its partially closed
nature some people don’t count it as a part of OSINT, but for the sake of simplicity
we will be considering it so.
Some social media types are:
• Blogs (e.g.. Blogger)
• Social network websites (e.g., Facebook)
• Media sharing communities (e.g., Flickr)
• Collaborative projects (e.g., Wikipedia)
As now we have a clear idea about OSINT as well as social media from its per¬
spective, let’s move on to understand one of the integral part of social media and a
common source of information sharing, i.e., social networks.
SOCIAL NETWORK
A social network website is a platform which allows its users to connect with each other
depending upon their area of interest, location they reside in, real life relations, etc.
Today they are so popular that almost every internet user has a presence on one or more
of these. Using such websites we can create a social profile of our own, share updates,
and also check profiles of other people in which we have some form of interest.
Copyrighted material
PA21
Social network
Some of the common features of social network websites are:
• Share personal information
• Create/join a group of interest
• Comment on shared updates
• Communicate via chat or personal message
Such websites have been very helpful in connecting people over boundar¬
ies, building new relations, sharing of ideas, and much more. They are also very
helpful in understanding an individual, their personality, ideas, likes/dislikes,
and what not.
INTRODUCTION TO VARIOUS SOCIAL NETWORKS
There are several popular social network sites where we are already registered, but
why so many of these different social network sites, why not just couple of them?
The reason is different social network focuses on different aspects of life. Some
focus on generic real life relations and interests like Facebook, Google+, etc. Some
focus on its business or professional aspect like Linkedln and some on microblog¬
ging or quickly sharing of views like Twitter. There are lot more popular social net¬
works with different aspects but in this chapter we will restrict to only these some of
the popular ones, which are:
• Facebook
• Linkedln
• Twitter
• Google+
"—
■ •.
i
FIGURE 2.2
Social network sites.
21
Copyrighted material
22 CHAPTER 2 OSINT and Advanced Social Media Search
Facebook
Facebook is one of the most popular and widely used social network sites. It was
founded on February 4, 2004 by Mark Zuckerberg with his college roommates. Ini¬
tially Facebook was restricted among Harvard University students, but now it’s open
for all to register and use Facebook whose age is above 13, though no proof is required.
Among all other social network sites it contains the most wide-age group audience due
to some of its popular features and generic aspects. Currently it has over a billion active
users worldwide and adds over half a petabytes of data every 24 h.
It allows creating a personal profile where user can provide details like work and
education, personal skills, relationship status, add family member details, add basic
information like gender, date of birth, etc.; contact information like e-mail id, web¬
site details, etc.; and also life events. It also allows creating a page for personal or
business use which can be used as a profile. We can also create groups, join a group
of his/her interest, add other Facebook users based on relations or common interest,
and categorize the friends. Like something we like, comment on something, share
what we feel as status, check in where we were, share what we are doing right now,
add pictures and videos. We can also share messages with someone or in a group
publicly or privately and chat with someone. Adding notes, creating events, and play¬
ing games are some of its other features.
Now you might be thinking that why are we sharing all this information, because
as a Facebook user most of us are aware of all these things. The reason to put a light
on these features is it will help us in OSINT. As we discussed earlier, Facebook adds
over half a petabyte data every 24 h, it has more than a billion active user and it
allows users to share almost everything, combining all these three above statements
we can say Facebook contains petabytes of structured data of over billion users like
what a user likes, a user’s basic information such as his/her name, age, gender, cur¬
rent city, hometown, work status, relationship status, current check-ins where he/
she visited recently, everything which is a treasure hunt for any information gather¬
ing exercise. Now though mostly we don’t use Facebook for hardcore intelligence
purposes but still we use its search facility sometime to search for some person or
page, etc. Like some day we remembered a school friend and we want to search for
him/her in Facebook, so we search his/her name in Facebook or we search his/her
name with the school name to get the result. Other option that can be used is if there
is a group for the schoolmates, we can directly go there and search for the friends.
Based on our preferences, location, studied school, college, colleagues, and friends,
Facebook also recommends us some friends with people you may know option. This
option also helps a lot to search someone in Facebook. We will cover advanced ways
of searching in Facebook in an upcoming topic.
Facebook does allow setting privacy in most of the things mentioned above, like
whom you want to share this information with, whether public, friends and friends of
friends, only friends or only me. It also allows users to block inappropriate content or
users, report spam and inappropriate content. But guess what most of us are unaware
of these functionalities or simply ignore them.
Copyrighted material
Social network 23
Linkedln
If you are a job seeker, jobholder, job provider, or business person, Linkedln is the
best place to stay active. It can be called as professional network where people are
mostly interested in business-oriented stuffs. It has more than 259 million members
in over 200 countries and territories.
Linkedln allows us to register and create a profile. The profile basically consists
of name, company name, position, current work location, current industry type, etc.
Here we can also add details about our work like job position and responsibilities,
add educational details, honor and award details, publications, certificates, skills,
endorsement, projects undertaken, languages known, almost our complete profes¬
sional life. Apart from that Linkedln also allows us to add personal details such as
date of birth, marital status, interests and contact information, which are a concern
for certain employers.
Like Facebook it also allows us to connect with other users of similar interest or
with whom we have some level of relationship. To maintain professional decorum
Linkedln restricts us to invite others if we have got too many of responses like “I
don’t know” or spam for our connection requests. Similar to Facebook there arc also
different groups present in Linkedln where we can join to share common interest.
It also provide feature to like, comment, and share whatever we want and also to
communicate with other connections via private message. The one simple yet rich
feature of Linkedln over Facebook is that in Facebook we can only see the mutual
friends between two users, but in Linkedln it will show us how we are connected with
a particular user just by visiting his or her profile. It also shows what are the things
common between two of us so that we can easily understand on what and up to what
extent the other user is similar to us. Other major thing is that in Linkedln if we sneak
into someone’s profile, that user will get to know that someone has viewed his/her
profile. Though this can be set to partial anonymous or full anonymous using privacy
settings but still it’s a very good feature in terms of professional network. Let’s say
we are a job seeker and some recruiter just sneaked into your profile, then we can
expect a job offer. Like Facebook, Linkedln also allows us to set privacy policy on
almost everything.
Linkedln is a great place for job seekers as well as job providers. The profile can
be used as bio-data/resume or CV where recruiter can directly search for candidate
based on the skill set requirement. Other than that it also has a job page where we can
search or post jobs. We can also search for jobs based on his current industry type or
company followed by him/her. Job seeker can search job based on location, keyword,
job title, or company name.
Now from an OSINT perspective like Facebook, Linkedln also has a lot of struc¬
tural information or we can say structural professional information about a particular
user and company such as full name, current company, past experience, skill sets,
industry type, company other employee details, company details, etc. and using some
advanced search techniques in Linkedln we can collect all those information effi¬
ciently which we will discuss soon.
Copyrighted material
24 CHAPTER 2 OSINT and Advanced Social Media Search
Twitter
Twitter is a microblogging service type social network. It allows us to read the short
140 character (or less) based messages as known as tweets without registration but
after logging in we can both read as well as compose tweets. It is also known as SMS
of the internet.
Unlike other social network sites Twitter has a user base which is very diverse in
nature. Nowadays Twitter is considered as the voice or speech of a person. Tweets are
considered as statements and arc parts of news bulletin, etc.
The major reason why it is considered as voice of a person is because of its veri¬
fied accounts. Verify account is a feature of Twitter which allows celebrities or public
figures to show the world that it is the real account, though sometimes they also
verify their account just to maintain control over the account that bears their name.
Like other social networking sites when we get registered in Twitter, it allows
us to create a profile, though it contains very limited information like name, Twitter
handle, a status message, website detail, etc.
A Twitter handle is like a username which uniquely identifies us in Twitter. When
we want to communicate with each other we use this Twitter handle. Twitter handle
generally start with a sign and then some alphanumeric characters without space,
for example, @myTwitterhandle. It allows us to send direct message to another user
privately via messages or publicly via tweets, it also allows us to group a tweet or topic
by using hashtag “#”. Hashtag is used as a prefix of any word or phrase such as #LOL
which is generally used to group a tweet or group a topic under funny category.
A word, phrase, or topic that is tagged mostly, within a time period is said to
be a trending topic. This feature allows us to know what is happening in the world.
Twitter allows us to follow other users. We can tweet, or simply share someone’s
tweet which is also known as retweets. It also allows us to favorite a tweet. Like
other social network sites it also allows us to share images and videos (with certain
restrictions). Tweets visibility is by default public but if a user wants then he/she can
restricts their tweets to just their followers. Twitter is nowadays popularly used for
announcement or giving verdict, statement, or replying to something online. The
tweets of the verified account are taken as direct statement of that person. Corporates
use this for advertising, self-promotion, and/or announcements.
Unlike the two social networks we discussed earlier Twitter does not contain
much personal or professional data, yet the information it provides is helpful. We
can collect information about social mentions, such as if you want to search details
about infosec bug bounty, you can search in Twitter with a hashtag and you will get
lots of tweets related to this where you can collect information such as which are the
companies into bug bounty. What is the new bug bounty started? Who all are partici¬
pating in bug bounty, etc. Unlike other social network sites Twitter has large amount
of structured information based on phrases, words, or topics.
Google+
Google+ also known as Google+ is a social networking site by Google inc. It is
also known as identity service which allows us to associate with the web-contents
Copyrighted material
PA25
Advanced search techniques for some specific social media
created by us directly by using it. It is also the second largest social networking site
after Facebook with billions of registered and active users. As Google provides vari¬
ous services such as Gmail, Play store, YouTube, Google Wallet, etc., the Google+
account can be used as a background account for these.
Like other social networking sites we just came across, Google-i- also allows us to
register ourselves, but the advantage that Google+ has over other social networking
sites is that most Gmail (the popular e-mail solution by Google) users will automati¬
cally be a part of it just by a click. Like other social network sites we can create pro¬
file which contains basic information like name, educational details, etc.
Unlike other social networking sites a Google+ profile is by default a public pro¬
file. It allows video conferencing via Google hangout. It allows us to customize our
profile page by adding other different social media links that we own like a blog.
We can consider it as one background solution for many Google services but yet
it has its own demerit. Many users has one or more Gmail accounts that they actively
use but in case of Google+ they might have the same number of accounts but they can
use only one as active account. So in this way there is a chance that the total number
of registered accounts and active user ratio might be very less as compared to other
social networking sites.
Like its competitors Google+ also allows to create, join communities, follow or
add friends, share photos, videos, or locations but one feature that makes Google-i- a
better social networking site is its +1 button. It’s quite similar to like button in Face-
book but the added advantage is that when the +1 count is higher for a topic or a link
it increases its PageRank in Google also.
Now the OS1NT aspect of Google+, like other social networking sites Google+
also have a huge amount of structured data of billion users. Other feature that makes
Google+ a better source of information gathering is that the profiles are public. So no
authentication required to get information. One another advantage of Google-i- over
other social sources is that it’s a one stop solution; here we can get information about
all the Google content a user is contributing or at least the other Google services
details a user is using. This can be pandora of treasure.
ADVANCED SEARCH TECHNIQUES FOR SOME SPECIFIC
SOCIAL MEDIA
Most of the social media sites provide some kind of search functionality to allow us
to generally search for things or people we are interested in. These functionalities,
if used in a bit smarter way, can be used to collect hidden or indirect but important
information due to structural storage of user data in these social media.
FACEBOOK
We already discussed how Facebook can be a treasure box for information gathering. One
functionality that helps us to get very precious information is Facebook graph search.
25
Copyrighted material
26 CHAPTER 2 OSINT and Advanced Social Media Search
Facebook graph search is a unique feature that enhances us to search people or
things that are somehow related to us. We can opt for graph search to explore loca¬
tion, places, photos, and search for different people. It has its unique way of sug¬
gesting what we want to search based on first letters or words. It starts searching an
item from different category of Facebook itself such as people, pages, groups, and
places, etc. and if sufficient results are not found it starts searching on Bing search
engine and provides user with sufficient results. To provide most relevant results,
Facebook also looks into our relation or at least area of interest and past experi¬
ence, for example, we can get those things in higher ranking result those are either
liked, commented, shared, tagged, check-in, or viewed either directly by us or by our
friends. We can also filter the results based on social elements such as people, pages,
places, groups, apps, events, and web results. The technology that Facebook is using
in its graph search can be defined as the base of the semantic web, which we will
discuss at the end of this chapter.
Now though we learned about the feature that can allow us to search different
things in Facebook but still the question is how? Now let’s start with some simple
queries.
Just put photos in search bar and you will be suggested by Facebook with some
queries such as photos of my friends, photos liked by me, my photos, photos of X,
etc. and similarly we can get lots of photo-related queries or we may create our own
queries. So based on photos what we can get ultimately a query such as “Photos
taken in Bangalore, India commented on by my friends who graduated before 2013
in Bhubaneswar, India” so it’s basically about our own imagination what exactly we
want to retrieve, then based on keywords we can create complex queries to get the
desired results; though Facebook will suggest some of the unexpected queries based
on keywords mentioned in the search bar. Similarly we can search for persons, loca¬
tions, restaurant, employee of a particular company, music, etc.
Some basic queries related to different social elements are as follows:
1. Music I may like
2 . Cities my friends visited
3 . Restaurants in Macao, China
4 . People who follow me
5 . Single females who live in the United States
6. My friends who like X-Men movies
7 . People who like football
Now let’s combine some of these simpler queries to create a complex one
“Single women named ‘Rachel’ from Los Angeles, California who like football
and Game of Thrones and live in the United States.” Now isn’t it amazing! And yes
we can create query using following filters, like based on basic information such as
name, age, gender, based on work, and education such as class, college passing year,
degree name, based on likes and dislikes, based on tagged in, commented on, based
on living, and also based on relationships. Now it’s our wild imagination that can
lead us to create different queries to get desired result.
Copyrighted material
PA27
A https://www.facebook.com/5earch/str/rachel/users-named/single/users/11246B092102121/residents/present/7413:
Single women named "Rachel" from Los Angeles, California who like Football and Game of Thrc Q.
I
Rachel
University of Or...
From Los Angeles, California Lives in Eugene, Oregon
Single
Likes Oregon Foo tball. Game of Thrones and 731 others
Studies Theatre at|^
If Add Friend
P Message | ••• |
Relationship-based searches are still being built, so you may see additional results here in the
future. Learn more about who can see each others' relationships in news feed, search and other
places on Facebook.
End of results
FIGURE 2.3
Facebook graph search result.
LINKEDIN
As we discussed how Linkedln has its structural data of billions of users and what
can we get if we search for something in particular, let’s see how to search this partic¬
ular platform. Linkedln provides a search bar in top to search for people, jobs, com¬
panies, groups, universities, articles, and many more. Unlike Facebook, Linkedln has
its advance search page where we can add filters to get efficient result. Following is
the page link for advanced search in Linkedln:
https://www.LinkedIn.com/vsearch/p?trk=advsrch&adv=true
D www.linkedin.com/vsearch/f?orig=TRNV&rsid=673680421402331295068&trk=vsrp_all_sel&trkInfo=VSRPsearchId%3A673€ Q. | I
People
Jobs
Located in or near:
Search for people, jobs, companies, and more...
Advanced People Search
Relatonsho
Ei 1st Connections
B 2nd Connections
B Group Members
□ 3rd ♦ Everyone Else
Location
Current Company
industry
Past Company
School
Pro He Language
Nonproft Interests
0 Groups
□ OSMT
□ Information Security Community
] Offensive SecurCy/Offsec
□ Strategic and Competitive Mefigence Profe...
Members of Investigative Reporters and EdL.
03 Years of Experience
Q3 Function
H Seniority Level
03 Interested In
Q3 Fortune
03 V
FIGURE 2.4
Linkedln advanced search options.
Copyrighted material
28 CHAPTER 2 OSINT and Advanced Social Media Search
This advanced search page allows us to search for jobs and people based on cur¬
rent employee, past employee, job title, zip code radius, interested in, industry type,
etc. It also allows us to search based on type of connection.
Different input boxes and their uses
• Keyword
The keyword input box allows a user to insert any type of keyword such as
pentester or author, etc.
• First Name
We can search using first name.
• Last Name
We can search using last name.
• Title
Title generally signifies to the work title. Using which user will be provided
with a drop down menu with four options to choose like current or past, current,
past, past not current to enrich the search.
• Company
We can search using company name. It also comes with a drop down menu with
the options we just discussed.
• Location
This drop down box comes with two options, i.e, located in or near and any¬
where. User can use whatever he/she wants.
• Country
Search based on country.
• Postal Code
Search based on postal code. There is a lookup button present for user to check
whether the entered postal code is for the desired location or not. By entering
postal code automatically a within drop down box enables which contains fol¬
lowing options to choose:
1. 10mi(15km)
2. 25mi (40km)
3 . 35mi (55km)
4. 50mi (80km)
5 . 75mi (120km)
6. lOOmi (160km)
This can be used to select the radius area you want to include in search along
with the postal code.
• Relationship
This checkbox contains options to enable direct connection search, connection
of connection search, group search, and all search. User can enable the final
option, i.e., 3rd+ everyone else to search everything.
• Location
This option is for adding another location which is already mentioned in postal
code.
Copyrighted material
Advanced search techniques for some specific social media 29
• Current Company
This option allows a user to add current company details manually.
• Industry
It provides a user with different options to choose one or more at a time.
• Past Company
This option allows to add past company details manually.
• School
Similar to past company we can add details manually.
• Profile Language
It provides a user to choose different languages one or more at a time.
• Nonprofit Interests
It provides user to choose two options cither bored services or skilled volunteer
or both.
The options which are present in the right side of the advanced search page are
only for premium account members. There are other added functionality also
present only for premium users.
The premium member search filter options are
• Groups
• Years of Experience
• Function
• Seniority Level
• Interested In
• Company Size
• Fortune
• When Joined
Apart from all these Linkedln also allows us to use Boolean operators. Below
are the operators with simple examples:
• AND: It can be used for the union of two keywords such as developer AND
tester.
• OR: It can be used for options. Let’s say a recruiter want to recruit a guy for
security industry so he/she can search something like pentester OR “security
analyst” OR “consultant” OR “security consultant” OR “information security
engineer.”
• NOT: This can be used to exclude something from other things let’s say a
recruiter wants fresher level person for some job but not from training domain
so he/she can use developer NOT trainer.
• (Parentheses): This is a powerful operator where a user can group something
from other such as (Pentester OR “Security Analyst” OR “Consultant” OR
“Security Consultant” OR “Information Security Engineer”) NOT Manager.
• “Quotation”: It can be used to make more than one words as a single keyword
such as “Information Security Engineer.” Now if we use the same word without
quotation. Linkedln will treat it as three different keywords.
Unlike search engines which can hold a limited keyword in search box, Linkedln
allows unlimited keywords that is a major plus for the recruiters to search for skill
Copyrighted material
30 CHAPTER 2 OSINT and Advanced Social Media Search
sets and other job requirement keywords in Linkedln. So it provides the user freedom
to use any number of keywords he/she wants with the use of operators wisely to cre¬
ate a complex query to get desired result.
Example of a complex query to look for information security professionals but
who are not manager:
((Pentester OR “Security Analyst” OR “Consultant” OR “Security Consultant”
OR “Information Security Engineer”) AND (Analyst OR “Security Engineer” OR
“Network Security Engineer”)) NOT Manager.
((Pentester OR "Security Analyst* OR "Consultant" OF*
SEARCH
Advanced >
619,387 results for ((Pentester OR “Security Analyst" OR '‘Consultant" OR “Security Consultant” OR
“Information Security Engineer") AND (Analyst OR “Security Engineer" OR “Network Security
Engineer) )NOT Manager _
FIGURE 2.5
Linkedln advanced search result.
TWITTER
So as we discussed earlier Twitter is basically about microblogging in the form of
tweets and hence it allows us to search for tweets. Now simply inputting a keyword
will get us the tweets related to that keyword but in case we need more specific
results we need to use some advanced search operators. Let’s get familiar with some
of them.
In case we want to search tweets for specific phrases we can use the “”, for
example, to search for the phrase pretty cool the query would be “pretty cool.”
To look hashtag we can simply type the hashtag itself (e.g., #hashtag). In case
we want to search for a term but want to exclude another specific term, we can
use the - operator. Say, for example, we want to search for hack but don’t want
the term security, then we can use the query hack -security. If we want the results
to contain either one or both of the terms, then we can use the OR operator,
such as Hack OR Security. To look for the results related to a specific Twitter
account, we simply search by its Twitter handle (@Sudhanshu_C). The filter
operator can be used to get specific type of tweet results, for example, to get
tweets containing links we can use filter:links. From and To operators can be
used to filter the results based upon the sender and receiver respectively, e.g..
From: sudhanshujc, To:paterva. Similarly Since and Until can be used to specify
the timeline of the tweet, e.g., hack since:2014-01-27, hack until:2014-01-27.
All these mentioned operators can be combined to get better and much precise
results. To checkout other features we can use the Twitter advanced search page
at https://Twitter.com/search-advanced, which has some other exiting features
such as location-based filter.
Copyrighted material
PA31
Web 3.0
f -> C rt | fi Twitter, Inc [US) | https-y/twittefXiom/vcarch ddvjnoed & 5
FIGURE 2.6
Twitter advanced search options.
SEARCHING ANY OPEN SOCIAL MEDIA WEBSITE
So wc learned about social networks and how to search some of them, but what about the
platforms that we need to search, but don’t support any of the advanced search features
we discussed about. Don’t worry we have got you covered, there is a simple Google
search trick which will help us out, it is the site operator. A Google search operator is
simply a way to restrict the search results provided by Google within a specific con¬
straint. So what the site operator does is that it restricts the search results to a specific
website only, for example, if we want to search for the word “hack,” but we only want
results from the Japanese Wikipedia website, the query we will input in Google would be
site:ja.xvikipedia.org hack. This will give results for the word hack in the site we speci¬
fied, i.e., ja.wikipedia.org. Now if we want to search in multiple platforms at once there is
another Google operator which comes in handy, it is the OR operator. It allows us to get
results for cither of the keywords mentioned before and after it. When we combine it with
the site operator it allows us search results from those specific platforms. For example,
if we want to search the word “hack” in Facebook as well as Linkedln the Google query
would be site:Facebook.com OR site:Linkedln.com hack. As we can see these operators
are simple yet very effective, we will learn more about such operators for Google as well
as some of the lesser known yet efficient search engines in the coming chapters.
WEB 3.0
So we discussed about Web 2.0 its relevance and how it affects us and also how to
navigate through some of the popular social networks, now let’s move forward and
see what we are heading toward. Until now most of the data available on the web are
31
Copyrighted material
32 CHAPTER 2 OSINT and Advanced Social Media Search
unstructured, though there are various search engines like Google, Yahoo, etc., which
continuously index the surface web yet the data in itself has no standard structure.
What this means is that there is no common data format which is followed by the
entire web. The problem with this is that though search engines can guide us to find
the information we are looking for yet they can’t help us answer complex queries or a
sequence of queries. This is where semantic web comes in. Semantic web is basically
a concept where the web follows a common data format which allows giving mean¬
ing to the data. Unlike Web 2.0 where human direction is required to fetch specific
data, in the semantic web machines would be able to process the data without any
human intervention. It would allow data to be interlinked not just by hyperlinks but
meaning and relations. This would not only allow data sharing but also processing
over boundaries, machines would be able to make a relation between data from dif¬
ferent domains and generate a meaning out of it. This semantic web is a crucial part
of the web of the future, Web 3.0 and hence is also referred as semantic web by many.
Apart from semantic web there are many other aspects which would contribute
toward Web 3.0, such as personalized search, context analysis, sentiment analysis,
and much more. Some of these features are already becoming visible in some parts of
the web, they might not be mature enough yet the evolution is rapid and quite vivid.
Copyrighted material
PA33
Understanding Browsers
and Beyond
3
INFORMATION IN THIS CHAPTER
• Browser’s basics
• Browser architecture
• Custom browsers
• Addons
INTRODUCTION
In first chapter we discussed a little about web browsers in general, then we moved
on to put some light on different popular browsers such as Chrome and Firefox and
also tried to simplify the process behind browsing. Now it’s time to understand what
exactly happens in background. You might think that why is this required. As we
have gone through some of the details earlier in this book the reason to focus on
browsers and discuss different aspects of it in details is because the majority of tools
we will use in the course of this book are mainly web based and to communicate with
those web-based tools we will use browsers a lot. That’s why it is very important to
understand the working of a browser and what exactly is going on in background
when we do something in it. Learning the internal process of how browser operates
will help us choosing and using it efficiently. Later we will also learn about ways
to improve the functionalities of our daily browsers. Now without wasting much
time on definitions and descriptions which we already covered, let’s get to the point
directly and that is “The secrets of browser operation.”
BROWSER OPERATIONS
When we open a browser, we will generally find an address bar where we can insert
the web address that we want to browse; a bookmark button to save the link for future
use; a Show bookmark button, where we can see what all bookmark links we already
have in the browser; back and forward button to browse pages accordingly; a home
button to redirect from any page to the home page which has been already set in the
browser and an options button to set all the browser settings such as to set home
page, download location, proxy settings, and lots of other settings. The location of
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978.0-12-801867-5.ft0003-3 33
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
34 CHAPTER 3 Understanding Browsers and Beyond
these buttons might change with the versions to provide better user experience but
somewhere in the interface of the browser you will find all these buttons for sure.
As all the browsers have quite similar user interfaces, with most of the function¬
alities common as discussed above but still there are some facilities and functional¬
ities that make each browser unique. There are different popular browsers such as
Chrome, Firefox, IE, Opera, and Safari but as discussed earlier in Chapter 1, we will
focus mostly on two browsers which are also present in open source versions and
they are Chrome and Firefox.
HISTORY OF BROWSERS
The first browser was written by Tim Berners-Lee in 1991 which only displayed text-
based results. The first user-friendly commercial graphical browser was Mosaic. To
standardize the web technology an organization was found named World Wide Web
Consortium, also known as W3C in 1994. Almost all of the browsers came into the
market in mid-1990s. Today browsers are much more powerful than they were in
early 1990s. The technology has evolved rapidly from text only to multimedia and
is still moving on, today browsers display different type of web resources such as
video, images, documents along with HTML and CSS. How a browser will display
these resources are specified by W3C.
BROWSER ARCHITECTURE
Browser architecture differs from browser to browser, so based on common compo¬
nents if we derive an architecture it will be something as follows.
Data Persistence
FIGURE 3.1
Browser architecture.
Copyrighted material
PA35
Browser architecture
USER INTERFACE
The user interface here is what we have already discussed above. It’s all about the
buttons and bars to access the general features easily.
BROWSER ENGINE
It’s the intermediate or combination of layout engine with render engine. Layout
engine is nothing but the user interface.
RENDERING ENGINE
It’s responsible for displaying the requested web resources by parsing the contents.
By default it can parse html, xml, and images. It uses different plugins and/or exten¬
sions to display other type of data such as flash, PDF, etc.
There are different rendering engines such as Gecko, WebKit, and Trident.
Most widely used rendering engine is WebKit or its variant version. Gecko and
WebKit are open source rendering engines while Trident is not. Firefox uses
Gecko, Safari uses WebKit, Internet Explorer uses Trident, Chrome and Opera
uses Blink, which is a variant of WebKit. Different rendering engines use different
algorithms and also have their different approaches to parse a particular request.
The best example to support this statement is that you might have encountered some
website which work with a particular browser because that website is designed
compatible to that browser’s rendering engine so in other browsers they don’t
work well.
NETWORKING
This is a major component of a browser. If it fails to work, all other activities will fail
with it. The networking component can be described as socket manager which takes
care of the resource fetching. It’s a whole package which consists of application
programming interfaces, optimization criteria, services, etc.
Ul BACKEND
It provides user interface widgets, drawing different boxes, fonts, etc.
JAVASCRIPT INTERPRETER
Used to interpret and execute java script code.
DATA PERSISTENCE
It is a subsystem that stores all the data required to save in a browser such as ses¬
sion data. It includes bookmarks, cookies, caches, etc. As browsers store cookies
which contain user’s browsing details that are often used by marketing sites to push
35
Copyrighted material
36 CHAPTER 3 Understanding Browsers and Beyond
advertisement. Let’s say we wanted to buy a headphone from some e-commerce site
so we visited that site but never bought that. Then from our browsing data marketing
sites will get this information and will start pushing advertisements at us of the same
product may be from that same e-commerce site or others. This component definitely
has its own importance.
ERROR TOLERANCE
All the browsers have traditional error tolerance to support well-known mistakes
to avoid invalid syntax errors. Browsers have this unique feature to fix the invalid
syntax that’s why we never get any invalid syntax error on result. Though different
browsers fix these errors in different way but anyhow all the browsers do it on or
other way.
THREADS
Almost every process is single threaded in all the browsers, however, network opera¬
tions are multithreaded. It’s done using 2-6 numbers of parallel threads. In Chrome
the tab process is the main thread, while in other browsers like Firefox and Safari
rendering process is the main thread.
BROWSER FEATURES
Web browsing is a very simple and generic term that we all are aware of, but are
we aware of its importance. A web browser opens a window for us to browse all
the information available on the web. Browsers can be used for both the purposes,
online browsing as well as offline browsing. Online browsing is that we do regu¬
larly with an internet connection. Offline browsing means opening local html con¬
tents in a browser. Modern browser also provides features to save html pages for
offline browsing. These features allow a user to read or go through something later
without any internet connection; we all have used this feature sometime during our
browsing experience. When we save a page for offline view, sometime we might
find that certain contents in a page are missing during offline browsing. The reason
being that when we save a page it only saves direct media available for the page, but
if a page contains resources from some other sites then those things will be found
missing in offline view. Let’s discuss some of the added functionalities provided
by browsers.
PRIVATE BROWSING
Incognito is the term associated with Chrome for private browsing, whereas Firefox
uses private browsing only as the term. It allows us to browse the internet without
saving details of what we browse for that particular browsing session. We can use
private browsing for online transactions, online shopping, opening official mails on
public devices, and much more.
Copyrighted material
PA37
Browser features
In Firefox and Chrome we can find this option near new window option. The
shortcut key to open secure browsing in Firefox is Ctrl+Shift+P and for Chrome is
Ctrl+Shift+N. The difference between normal browsing window and a private brows¬
ing window is that we get some kind of extra icon present in the title bar of the
window. In Firefox it’s a mask icon whereas in Chrome it’s a detective icon. For the
fancy features, browsers use these kinds of fancy icons.
® t Starch or tntrr addrtts V (? 9 0 - Google
Private Browsing
Firefox won’t remember any history for this window.
In a Private Browsing window, Firefox won't keep any browser history, search history,
download history, web form history, cookies, or temporary internet files. However,
files you download and bookmarks you make will be kept.
To stop Private Browsing, you can close this window.
9 While this computer won't have a record of your browsing history, your internet service
provider or employer can still track the pages you visit.
Leam More
FIGURE 3.2
Firefox private browsing.
Private browsing will not save any visited pages details, form fill entries, search
bar entries, passwords, download lists, cached files, temp files or cookies. Though the
data downloaded or bookmarked during secure browsing will be saved in local system.
What private browsing does not provide?
It only helps user to be anonymous for local system while the internet service pro¬
vider, network admin, or the web admin can keep track of the browsing details and it
will also not protect a user from keyloggers or spywares.
There is always an option available to delete the data stored by a browser manu¬
ally. We can simply click on clear recent history button and select what needs to be
deleted and it’s done.
AUTOCOMPLETE
Almost all browsers have this feature to configure it to save certain information such as
form details and passwords. This feature has different names in different browsers or
it is specific with different rendering engines. Some of the names are Password Auto-
complete, Form Pre-filing, Form Autocomplete, Roboform, Remember password, etc.
Browsers provide user freedom to configure whether to save these information or
not, if yes then whether to get some kind of prompt or not, what to be saved and in
what type it should be saved.
37
Copyrighted material
38 CHAPTER 3 Understanding Browsers and Beyond
In Firefox to avoid password storage, go to Menu -»Options -» Security -> Uncheck
“Remember passwords for sites,” though we can store password in encrypted format
using browser configuration.
In Chrome, go to Menu —> Settings -> Show advanced settings -> Under Passwords
and forms uncheck “Enable Auto-fill to fill out web forms in single click” and “Offer
to save your web password.”
Some web application treat this as a vulnerability or possible security risk so they
used to add an attribute “autocomplete=olT” in their form of the input box value that
they do not want a browser to save, but nowadays most of the browsers either ignore
it or have stopped supporting this attribute and save all the data or some based on
browser configuration.
PROXY SETUP
Proxy setup feature is also an important feature provided by any browser. This feature
allows a user to forward the requests made by a browser to an intermediate proxy.
Most of the companies use some sort of proxy device to avoid data leakage and
those settings can be done in browser to limit or monitor the browsing process.
Proxy options are also popularly used by penetration testers to capture the request
and responses sent and received by a browser. They generally use some interception
proxy tool and configure the settings in browser.
In day-to-day life also proxy setup can be used for anonymous browsing or brows¬
ing or visiting some pages that are country restricted. In that case a user just has to
collect one proxy IP address and the port number of some other country where that
site or content is available then setting up the same in a browser to visit those pages.
Proxy setup in Firefox
Go to Menu -> Options -* Advanced Network Connection Settings -* Manual
proxy configuration and add the proxy here.
Proxy setup in Chrome
Go to Menu —> Settings — Show advanced settings -> Under Network click on Change
proxy settings -> Click on LAN Settings -> Check Use a proxy server for your LAN
(These settings will not apply to dial-up or VPN connections.) and add your settings.
RAW BROWSERS
There are specific browsers available by default with specific operating systems,
such as Internet Explorer for Windows and Safari for Mac. Almost all the browsers
have their versions available for different operating system. But the widely used and
popular browsers are not the one which comes preinstalled with operating system but
the one which are open source and easily available for different operating systems,
i.e., Mozilla Firefox and Google Chrome. Though Google Chrome was mostly used
by Windows operating system, one of its open source version was generally found
Copyrighted material
PA39
Raw browsers
preinstalled in many Linux operating systems and is called Chromium. As the name
itself has similarities with the Google Chrome browser, their features also do match
with each other with a little difference.
As earlier we came across that there are different types of browser rendering
engine like Gecko, WebKit, etc. and Chrome uses Blink the variant of WebKit so
does Chromium. This project initially started in 2008 and now there are more than
35 updated versions. This is one of the popular browsers among the open source
community. It is the concept behind Google Chrome window being used as the main
process because Chromium project was made to make lightweight, fast, and efficient
browser which can also be known as shell of the web by making its tab to be the main
process. There are different other browsers released based on the Chromium project
source code. Opera, Rockmclt, and Comodo Dragon are some of the well-known
browsers based on Chromium.
Now one thing is clear from above paragraph that if a browser will be open source,
then community will use that code to create different other browsers by adding some
extra functionality as Comodo Group added some security and privacy feature in the
Chromium and released it as Comodo Dragon. Similarly Firefox also has different
custom versions. So let’s consider the base version browser as Raw browser and
other browsers as customized browser.
WHY CUSTOM VERSIONS?
The custom versions are being used for different purposes, to use the true power of
functionalities of the Raw browser to the fullest or in simple words to make better use
of the features available by the Raw browsers. The custom browsers can help us to
serve our custom requirements. Let’s say we want a browser which can help us being
online 24/7 in social networking sites. We can either add different social network
addons on the browser of our choice to make it happen or we can start from scratch
and build a version of browser which contains the required functionalities. Similarly
for other cases like if we are penetration testers or security analysts, we might want a
browser to perform different application security tests so we can customize a browser
for the same. A normal user might need a browser to be anonymous while browsing
so that no one can keep track of what he/she is browsing, this can also be done by
customizing a browser. There are already a number of customized browsers available
in the market to serve these purposes and similarly we can also create such custom¬
ized browser according to our desire. As the process is a bit complex to be included
in this chapter and would require some technical background to understand we will
not be discussing it, still knowing that it is possible to do so opens a new window, for
people who would like this just take it as a self-learning project.
The Chromium project has its official website, http://www.chromium.org where
we can find different documentations and help materials to customize the browser for
different operating systems. Apart from Chromium it is maintained at sourceforge,
http://www.sourceforge.net/projects/chromium. From here we can download the
browser, download browser source code, subscribe to the mailing list to get updated
news about the project, and submit bugs and feature requests.
39
Copyrighted material
40 CHAPTER 3 Understanding Browsers and Beyond
If you are interested to customize Chromium, it will be a great kick start if you
subscribe the mailing list as well as explore the documentation available in source-
forge. The first step to customize any browser is to get its source code. So how to get
the source code of Chromium? It’s quite easy, we just need to download the latest tar
or zip version of the browser. Later by performing untar or unzip we will be able to
get the source code along with the documentation details inside.
Now let’s move on to discuss some already customized browsers and their
functionalities.
SOME OF THE WELL-KNOWN CUSTOM BROWSERS
EPIC (https://www.epicbrowser.com/)
Epic is a privacy browser as its tagline describes itself with the line “We believe what
you browse and search should always be private.” It is made to extend the online
privacy of a user. This browser is based on the Chromium project and developed by
the Hidden Reflex group and is available for both for Windows and OSX.
On visiting their official website, we will get one paragraph with heading “Why
privacy is important?”, this paragraph contains some of the unique and effective rea¬
sons for it, one such is that when we browse the data collected from that can decide
whether we are eligible to get a job, credit, or insurance. Epic was first developed
based on Mozilla Firefox but later it was changed to the Chromium-based browser. It
works quite similar to the secure browsing feature by Firefox and Chrome. It deletes
every session data such as cookies, caches, and any other temporary data after exit¬
ing the browser. It removes the services provided by Chrome to send any kind of
information to any particular server and it adds a no tracking header to avoid tracking
by data collection companies. It also prefers SSL connection over browsing and also
contains a proxy to hide user IP address. To avoid leak of the search preferences. Epic
routes all the search details through a proxy.
Here we saw a customizing Chromium project. Epic which was developed as a
privacy centric browser.
HconSTF (http://www.hcon.in/downloads.html)
HconSTF stands for Hcon security testing framework. It is a browser-based testing
framework. With the package of different addons added in the browser, it allows a
user to perform web application penetration testing, web exploit development, web
malware analysis along with OSINT in a semiautomated fashion.
HconSTF has two variants; one is based on Firefox that is known as Fire base and
the other based on Chromium that is known as Aqua base. The rendering engines are
also different as per the base Raw browser. Fire base uses Gecko and Aqua base uses
WebKit. Both the versions are loaded with tons of addons.
The core idea or inspiration of this project is taken from hackerfox but it’s not
quite similar to that. Hackerfox http://sourceforge.net/projects/hackfox/ is portable
Copyrighted material
42 CHAPTER 3 Understanding Browsers and Beyond
Arabic, Spanish, Turkish, French, Chinese simplified, and also in Chinese tradi¬
tional. As it is very popular in the security community it comes by default installed in
popular security operating systems such as Backtrack and Matriux.
It has security addons preinstalled and configured and with its simple yet user-
friendly user interface. Mantra is an integral part of every web application pen tester’s
arsenal. The tools available in Mantra not only focus on web application testing but
also on web services and network application penetration testing. It contains tools to
switch user agent, manipulate cookie, manipulate parameters and their values, add
proxy, and many more. FireCAT is also included in Mantra and that makes it more
powerful tool (we will cover FireCAT in next topic separately).
Some of the popular tools groups are mentioned below:
• Information gathering
• Flagfox
• Passiverecon
• Wappalyzer
• Application audit
• Rest client
• I Iackbar
• Dom inspector
• Editors
• Firebug
• Proxy
• Foxyproxy
• Network utilities
• Fircftp
• FircSSH
• Misc
• Event spy
• Session manager
WikipbdiA
Yahoo?
OWASP
facebook
Hackery Galley
□I
BTi
^delicious!
FIGURE 3.3
g^SHODAN
Mantra browser interface.
Copyrighted material
Some of the well-known custom browsers 43
Apart from tools it also contains bookmarks. The bookmark is divided into two
sections. First section is known as Hackery. It is a collection of different penetration
testing links which will help a user in understanding and referring a particular attack.
The other section contains gallery. It contains all the tools links that can be used for
penetration testing.
We can download both the versions of Mantra from the following URL,
http://www.getmantra.com/download.html or the individual download links are
below. Where Mantra based on Firefox is available for different operating systems
like Windows, Linux, and Macintosh whereas MOC is only available for Windows.
Mantra based on Firefox can be downloaded from http://www.getmantra.
com/down 1 oad. h tm 1 .
Mantra based on Chromium can be downloaded from http://www.getmantra.
com/mantra-on-chromium.html.
FireCAT (http://firecat.toolswatch.org/download.html)
FireCAT or Firefox Catalog for Auditing exTensions is a mind map collection of differ¬
ent security addons in a categorized manner. Now it’s collaborated with OWASP Mantra
project to provide one stop solutions to security addons based on browser customiza¬
tion. FireCAT contains seven different categories and more than 15 subcategories.
The categories and subcategories:
• Information gathering
• Whois
• Location info
• Enumeration and fingerprint
• Data mining
Googling and spidering
• Proxies and web utilities
• Editors
• Network utilities
Intrusion detection system
• Sniffers
• Wireless
• Passwords
• Protocols and applications
• Misc
• Tweaks and hacks
• Encryption/hashing
• Antivirus and malware scanner
• Antispoof
• Antiphishing/pharming/jacking
• Automation
• Logs and history
Backup/synchronization
• Protection
• IT security related
• Application auditing
Copyrighted material
44 CHAPTER 3 Understanding Browsers and Beyond
There is a category present “IT security related.” This one is an interesting category
because it provides you plugins to collect information about the common vulnerabili¬
ties and exposures (CVEs) and exploits from various sources such as Open Sourced
Vulnerability Database (OSDV), Packet storm, SecurityFocus, Exploit-DB, etc.
ORYON C (http://sourceforge.net/projects/oryon/)
Oryon C portable is an open source intelligence framework based on Chromium
browser meant for open source intelligence analysts and researchers. Like other cus¬
tomized browsers it also comes with lots of preinstalled tools and addons to support
the OSINT investigation. It also contains links to different online tools for better ref¬
erence and research. It is a project by “osintinsight” so some of the functions a user
can only use after subscribing some package of Osintinsight.
Oryon CNcm lab x y
- -» c *
Add Sonic* 0lnl«IRSS U*di f'l S«jf<h Er»jr«« | ‘] Cc>mp»fr/S«vch " Do<um*fd$*
w *
Sjn OU 90 64 328T(17 961C) omrcart clouds (g)
C. ■ ' «P. V ” ~ □
PeopleS*»/cK ’~l Em»ilSe»tch f 15mis«Seerch f'l Social WediiSeerch
b Btekko
H Zapmeta
* Hootsute
Web News Images Videos Blogs Maps
CH Resources People Search Company Search Patent Search Crime;
FIGURE 3.4
Oryon C browser interface.
0 ScBnq ^ Backup and Recowcri
• Carrot2 H bcquick £fjJ Ctuuz Search
□ Bing
2 2*ngual □ Soovte iD ID
• KUA
«> Feedly Q Evemote » OSINT Search
+
It’s a straightaway use tool so no need to install Oryon C, just download and
run it. It only supports Windows operating system 32 and 64 bit. The huge list
of useful addons and categorized bookmarks makes it a must-have for any online
investigator.
WhiteHat Aviator (https://www.whitehatsec.com/aviator/)
Though WhiteHat Aviator is not the one of its kind available but definitely it is a
product of a reliable big brand security organization. WhiteHat Aviator is a private
browsing browser. It’s quite similar to the Epic browser we discussed earlier in
Copyrighted material
PA45
Some of the well-known custom browsers
this chapter. It removes ads, eliminates online tracking to ensure a user to surf
anonymously.
Like Epic browser Aviator is also based on Chromium. By default it runs in
incognito or private browsing mode to allow a user to surf without storing any his¬
tory, cookie, temporary file, or browsing preferences. It also disables autoplay of
different media types, user has to allow a media such as flash in any page if he/she
wants to see it. It also uses a private search engine duckduckgo to avoid string search
preferences of the user.
Unlike Epic browser it is not open source, so open security community can¬
not audit the code or contribute much. Aviator is available for Windows as well as
Macintosh operating system.
TOR BUNDLE (https//www.torproject.org/projects/torbrowser.html.en)
TOR or the onion routing project is very popular project. Most of us definitely have
used, heard, or read about it somewhere some time. Though we will discuss about
it in detail in a later chapter but for the time being let’s discuss the basics about the
tor browser bundle. Like Epic browser and Whitehat Aviator, tor browser is also
a privacy centric browser. But the way it works is quite different from the other
two. Through the tor application it uses the volunteer distributed relay network and
bounces around before sending or receiving connection. It makes it difficult to back¬
track the location of the user and provides privacy and anonymity to the user. Due to
its proxy chaining type of concept it can be used to view the contents that are blocked
for a particular location such as a country. The tor browser is available for different
operating systems such as Windows, Linux, and Macintosh and can be used straight¬
away without installation. Tor browser or previously known as TBB or tor browser
bundle is a customized browser based on Firefox. It contains tor button, tor launcher,
tor proxy, HTTPS everywhere, NoScript, and lots of other addons. Like OWASP
Mantra it is also available in 15 different languages.
CUSTOM BROWSER CATEGORY
As we came across different custom browsers, their base build, what rendering engine
they use etc. Let’s categorize them to understand their usability.
For easy understanding let’s make three categories.
1. Penetration testing
2. OSINT
3. Privacy and anonymity
Under the first category we can find HconSTF, Mantra, FireCAT, whereas
under OSINT category we can add HconSTF and Oryon C, likewise we can put
Epic browser, Whitehat Aviator and tor browser under privacy and anonymity
category. If we look at the core, what puts all these different browsers in different
45
Copyrighted material
46 CHAPTER 3 Understanding Browsers and Beyond
category, the answer will be the addons or the extensions. So by adding some
similar functional addons we can create a customized browser for a specific pur¬
pose. If we want to create our own browser for some specific purposes we must
keep this in mind.
PROS AND CONS OF EACH OF THESE BROWSERS
Let’s start with the first browser we discussed and that is Epic browser. The advan¬
tage of using this browser is that it fully focuses on user privacy and anonymity. Apart
from that it’s open source and it can be used by all kind of users, technical as well as
nontechnical. The only disadvantage is that the reliability factor. Is this browser does
what it intends to do or does it do something else. As trust on the source is the key
here. So either trust the source and use the product or use it then trust the product.
The advantage of using HconSTF is that it’s a one stop solution for information
security researchers. The only disadvantage it has is that it does not allow a user to
upgrade it to the next level.
The advantage of OWASP Mantra is that it is available in different languages to
support security community from the different parts of the world. It has only one
disadvantage is that the light version or the MOC is only available for Windows, not
for other operating systems like Linux or Macintosh.
The advantage of Oryon C is that it is very helpful in OSINT exercises, but there
are different disadvantages like to use some of the modules a user need to subscribe
and also it is only available for Windows.
The disadvantage of the Whitehat Aviator is that it is not open source and it does
not have a version for Linux operating system.
TBB has the advantage is that it provides anonymity with a disadvantage like it
only comes with one rendering engine Gecko.
As we already discussed these custom browser categories; based on category,
user can choose which browser to use, but definitely the browsers for anonymity and
privacy have larger scope as they do not belong to any single category of users. Any
user who is concern about his/her online privacy can use these browsers. Like for
e-shopping, netbanking, social networking as well as e-mailing, these browser can
be helpful to all.
ADDONS
Browser addon, or in other terms browser extension or plugins are the same things
but known differently in terms of different browsers. As in Firefox it’s known as
addon and in Chrome as extension. Though plugin is a different component from
addons but still some use it as synonym for addon. In reality plugin can be a part of
addon.
Copyrighted material
Addons 47
Browser addons are typically used to enhance the functionality of a browser.
These are nothing but applications designed using web technology such as
HTML, CSS, and JavaScript. Though due to difference in rendering engines
the structure and code are different for different browser addons, but nowadays
there are different tools and frameworks available to design a cross browser
addon.
Addons are so popular that every web user might have used it already at some
point or another. Some of the popular addons are YouTube downloader, Google
translate in common and SOA client. Rest client, and hackbar in case of penetration
testers.
We can install addon quite easily in both the browsers, Firefox as well as
Chrome by simply clicking on install button. Addons are not always safe so
choose them wisely so download from trusted sources and also after going
through reviews. Sometimes we need to restart the browser to run a particular
addon. Like other softwares, addons also keep on looking for their updates and
update themselves automatically. Sometimes we might see that the addon is not
compatible with the browser version that means there are two possibilities, (1)
browser version is outdated, (2) addon is not updated to match with the require¬
ments of latest browser installed. Sometime it’s also possible that an addon might
affect the performance of a browser and can even make the browser slow. So
choose your addons wisely.
Let’s discuss some common addons and extensions that are available for both
Firefox as well as Chrome to serve in day-to-day life. Let’s see what kind of addons
are available and to serve what purpose.
We all use YouTube to watch video and share. Sometimes we also want to
download some YouTube videos so for that a number of addons are available
by installing which we do not need to install any other additional downloading
software. Another major issue we feel while watching videos in YouTube is that
the ads. Sometime we are allowed to skip the ads after 5 s and sometime we have
to watch the full 20 s ad. That is pretty annoying so there are addons available to
block ads on YouTube. Most of the people are addicted to social networking sites,
we generally open one or all of these at least once every day. Social networks like
Facebook, Linkedln, Twitter are like part of our life now. Sometime we need to
see the pictures of our friends or someone else in social networking sites and we
need to click on the picture to zoom that. It wastes lots of valuable time, so if we
want an addon to zoom all those for us when we point your mouse on the picture
then there is addons available known as hoverzoom both in Firefox as well as
Chrome.
There are different addons also available for chat notification, e-mail notification,
news, weather. It looks like think there are addons available for almost everything,
we just need to explore and definitely we will get one that will simplify our life. This
is just for brainstorm, now let’s discuss about some of the popular addons which will
help us in various different important tasks.
Copyrighted material
48 CHAPTER 3 Understanding Browsers and Beyond
SHODAN
It is a plugin available for Chrome. A user just has to install and forget about it. While
we browse an application, it will collect information available about the particular
site from its database and provide details such as what is the IP address of the web¬
site, who owns that IP, where is it hosted, along with open ports with popular services
and some of the popular security vulnerability such as HeartBleed. This is definitely
very helpful for penetration testers, if you haven’t tried it yet, you must. The only
limitation of this addon is that it will only show the results for the sites, for which
information is already available in shodan sources. It generally won’t show results
for new sites and staging sites as its database might not contain information on them.
WAPPALYZER
It is also a popular addon available for both the browsers Firefox and Chrome. It
uncovers the technology used by the web application. Similar to shodan, for wap-
palyzer also we simply need to install and forget, wappalyzcr will show details about
technology used while we browse a page. The way exactly wappalyzer works is
that it collects information related to the technology and versions from the response
header, source code, and other sources based on the signatures.
It identifies various different technologies such as CMS or content management
systems, e-commerce platforms, web servers details, operating system details, JavaS¬
cript framework details, and many other things.
Some of the types of technologies identified by wappalyzer are:
• Advertising networks
• Analytics platforms
• Content management system
• Databases
• E-commcrce
• Issue trackers
• JavaScript frameworks
• Operating systems
• Programming languages
• Search engines
• Video players
• Wikis
BUILDWITH
Buildwith is similar to wappalyzer. It also identifies technologies used by a web
applications based on signatures, using page source code, banner, cookie names, etc.
While wappalyzer is open source, buildwith is not. The paid version of buildwith
has way more features from its free version like contact information detection and
subdomain detection, etc. which can be very helpful at times.
Copyrighted material
PA49
Addons
f Twitter
9 ir H © A ht tps.' t>wtttf.tt>m Q e § E l~ Ooogi'e
_
built With
twitter.com/
O Dyn DNS
Usage Statistics • Websites using Dyn DNS
DNS scmces pvended by Dyn.
Usage Sucisc.cs • Websites using SPf
The Sender Poky Framework is an open standard
specif)- ng a technical method to prevent sender
address forgery.
I* Apps for Business
Usage Statistics ■ Websites using Coogle Apps for
Business
Web-based errai. calendar, and documents for
FIGURE 3.5
Buildwith identifying technologies on Twitter.
FOLLOW
Follow.net is a competitive intelligence tool which helps us to stay updated by the
online movement of our competitors and can be accessed using the browser addon
provided by it. The major difficulty faced to keep track of the competitor is that we
have to waste lots of time visiting their websites, blogs, tweets, YouTube channel,
etc. After visiting lots of website we don’t have a structured data from that we can
understand the trend being followed. So here is follow.net that do most of these and
much more for us and provides us report on how our competitor is trending on the
web. It collects information from various sources such as alexa, Twitter, keyword-
spy, etc. It will also send us a notification related to our competitors, if something
new comes up. The simple addon of follow provides a complete interface to browse
through all this information in an efficient manner.
So if we are starting a business want to learn the success mantra of your competi¬
tor then it is a must-have. The follow.net addon is available for both Firefox as well
as Chrome browser.
RIFFLE
Riffle by CrowdRiff is a social analytics addon. It’s focused on the popular micro-
blogging site Twitter. It provides us with a smart Twitter dashboard which displays
useful analytical data about a Twitter user of our choice.
It provides us the helpful information to create a popular account by giving refer¬
ence to some influential tweets and accounts who posted them. It also provides quick
insight about a Twitter user so that it will help us to understand and reply to that
particular user in a particular way.
49
Copyrighted material
50 CHAPTER 3 Understanding Browsers and Beyond
3 Twitter. Ir>c. [US]! https//twitter.com
» * & =
4?No»ifications # Oncovw ^ M
P o
® Search @usemjT>e
1 Sudhanshu Chauhan
* $$uon*HfHj_C -
FOLLOWING FOLLOWERS
349 128 1
> new Tweet
. J
« C—1
•Clung*
20M
X
of India
IIIO f-^
■1
•1
Express
3
nfluM
Tweets
WSJD WSJO 38s
WSJ 0 Turtter named anthonynoto as its CFO replacing n gupla. who becomes
head of strategic investments, bit ty/imQmirv
Expand
w
-*Ti
Who to
zz Sg£ Nat# Silver Nat«Sr-?r538 57s
“““g Llo Messi is thought of as the best soccer player in the worm. He’s underrated
53eig ht/imEuXKN
b r
II
Guy Kawasaki
@Guy Kawasaki
b # eg ?■ 7 Q ® ffi)
TWEETS
129k
FOLLOWING FOLLOWERS
108k 1.4m
Chief evangelist of Carrva https://! coAi8€c
9 Silicon Valley. Calif... canva com/about
H
Connect otn|
ftCTWCCTfi/TWCCT f AVOPl T f ?;/T Wt f T
9.2 9.4
View Top Tweets on MyTopTweet com >
TOP HASHTAGIS) MORE
^ ■■ sfacefcooktips wkawasefie »twittertips ft...
® 20U Twit#
Cookes Adi TOP mention;*] mors
@canva @ebengregory @atpinnovations
Anns jees
•*» Reoir tt Retweet * Favorite ••• More
r
OPURLfft)
is.gd brt.ly buff ly paper.li owJy
*
a m
Chris Carr#tt i'chrisga rett 1m
Happy Canada Day:) inslagram corrVprp6Pr4ppW5IAf
Expand
FIGURE 3.6
Riffle interface integrated into the browser.
Some of the key feature provided by this extension is that it helps in tweet source
tracking, activity breakdown, engagement assessments, etc. with a clean user inter¬
face. It’s a must-have for power users of Twitter.
WhoWorks.at
Similar to Riffle a Twitter focused addon, we have whoworks.at a Linkedln specific
addon. Let’s take a scenario where we are salespersons and we need to gather infor¬
mation about the key influential persons of a company, so how do we proceed. We
will go to Linkedln, search for that particular company and then find the 1st degree,
2nd degree, or 3rd degree connections. Based on their title we might want to add
them to discuss business. This is the old fashion way. Now there is another way to
do the same in a more automated manner. Now let’s install whoworks.at extension
on Chrome, visit the company website that we are interested in and let the extension
show us the 1st degree, 2nd degree, and 3rd degree connections from that company
along with details such as recent hires, promotions, or title changes.
This is the power of whoworks.at, it finds the connections for us when we visit a
website and saves us a lot of time.
ONETAB
Onetab is an addon or extension available for both the browsers Firefox as well as
Chrome. It provides us solution for tab management. It helps us to make a list of
tabs that are open in our browser and aggregate them under a single tab, especially
Copyrighted material
PA51
Addons
in Google Chrome as we already learned that it is a tab centric browser. The tab is
the main thread in Chrome so by using onetab we can save lot of memory because it
converts tabs into a list, which we can later restore one by one or all at a time as per
our wish.
SALESLOFT
Most of the sales people must have used it, if not they need to. It’s simply a dream
addon for salespersons. It allows to create a prospecting list from browsing profiles
from different social networks for leads focusing on a particular segment of mar¬
ket. It allows a user to run specific search based on title, organization, or industry
name. Some of the popular features arc it allows to gather contact information from
a prospect from Linkedln. Contact information contains name, e-mail id, and phone
number. We are allowed to add any result as a prospector by single click. Import
prospects from Linkedln and export it to excel or Google spreadsheets. It also allows
to synchronize the data directly with salesforce.com
It is a one stop free and lightweight solution for every sales person. Use it and
enhance your lead generation with its semiautomated approach.
PROJECT NAPTHA
We all know how it’s nearly impossible to copy the text present in any image, one
method is that we type it manually but that is definitely a bizarre experience. So here
is the solution. Project Naptha. It is an awesome addon which provides us freedom
to copy, highlight, edit, and also translate available text on any image present on the
web using its advanced OCR technology. It’s available for Google Chrome.
TINEYE
Tineye is a reverse image search engine, so its addon is also used for same. As we
enter keywords in search engines to get the required result, Tineye can be used to
search for a particular picture in the Tineye database. It has a large amount of images
indexed in its database. The myth behind the image identification technology is that it
creates a unique signature for each and every image it indexes in its database. When
user search for a picture it starts comparing that signature, and most of the time it gives
exact result. Apart from exact result it also gives similar results. Another great feature
of Tineye is that it can search for cropped, resized, and edited images and give almost
exact result. Tineye is available for both the browsers Firefox as well as Chrome.
REVEYE
Reveye is quite similar to Tineye. This addon is only available for Chrome. It works
very simple. It gives a user result of reverse image search based on results provided
by Google reverse image search as well as Tineye reverse image search.
51
Copyrighted material
52 CHAPTER 3 Understanding Browsers and Beyond
CONTACTMONKEY
Contactmonkey is a very useful addon for all professionals, especially sales. It helps us
to track our e-mails. Using this simple addon we can identify if the person we have sent
an e-mail has opened it or not and at what time. This can help us to identify whether our
mails are being read or are simply filling up the spam folder and also what is the best time
to contact a person. Though the free version has some limitations yet it is very useful.
If you want to improve your user experience of Google Chrome browser
this list by digital inspiration is a must to look at. The list contains some of the
Chrome extensions and apps list which will enhance Chrome features and also
enhance user experience. Following is the URL where you can get the list,
http://digitalinspiration.com/google-Chrome.
BOOKMARK
Bookmark is a common feature of every browser. It allows us to save a website’s
URL under a name for later use. Most of the time while browsing we get different
interesting pages but due to lack of time we cannot go through all those pages at that
time. There bookmark help us to save those links for future use.
There are popularly two ways to save the bookmark:
1. By clicking on bookmark button when we are on the page that needs to
bookmark.
2 . By clicking on ctrl+d when we are in the page that needs to bookmark.
We can even import and as well as export bookmarks from one browser to
another. We can also create a new folder for a list of bookmarks. In Firefox we need
to go to show all bookmark link or click on ctrl+shift+B where we will get all those
options directly or by right clicking on that page. Similarly for Chrome we need to
go to bookmark manager. There also we will find all the options on the page itself or
otherwise we need to right click on that page to get those options.
THREATS POSED BY BROWSERS
As we discussed browsers are a great tool which allows us to access the web and the
availability of various addons simply enhances its functionalities. This wide usage
of browsers also present a huge threat. Browsers being one of the most widely used
softwares are the favorite attack vector of many cyber attackers. Attackers try to
exploit most of the client side vulnerabilities using browser only, starting from phish¬
ing, cookie theft, session hijacking, cross-site scripting, and lots of others. Similarly
browsers are one of the biggest actors which play a role in identity leakage. So use
your browser wisely. In later chapters we will discuss about some methods to stay
secure and anonymous online. For now let’s move to our next chapter where we will
learn about various types of unconventional but useful search engines.
Copyrighted material
PA53
Search the Web—Beyond
Convention
4
INFORMATION IN THIS CHAPTER
• Search engines
• Unconventional search engines
• Unconventional search engine categories
• Examples and usage
INTRODUCTION
So in the second chapter we learned how to utilize advanced search of some of the
social network platforms to get precise results, then in the third chapter we moved on
to see how to better utilize our common browsers in uncommon ways and now this
chapter is about search engines.
We all are familiar with search engines and use them for our day to day research.
So as discussed in previous chapters, what search engines basically do is crawl
through the web using web spiders and index the web pages based on a wide range
of parameters, such as keywords, backlinks, etc. and based on this indexing we get
our results for the keywords we supply. Some of the most popular search engines are
Google, Yahoo, and Bing.
Different search engines use different methods to rate different links and based
upon their algorithm, assign different websites and different ranks. When we search
for a term(s), the search engines provide results based upon these ranks. These ranks
keep on changing based upon various different factors and this is why we might get
different results for same query on different dates.
So now it is safe to say that as an average user we are familiar with search engines and
their usage. As stated earlier this chapter is about search engines; but not the conventional
ones we use daily. The search engines we will be dealing with in this chapter are special¬
ized, some of these perform their search operations in a different manner and some of
them provide search facility for specific domain. But are they really required when we
have search engines like Google which are very advanced and keep on updating with new
features? The short answer is yes. Though search engines like Google are very good at
what they do, they provide generic results in the form of website links which according
to them are relevant for the query keywords, but sometimes we need specific answers
related to specific domain, this is when we need specific types of search engines. Let’s go
ahead and get familiar with these and find out how useful they are.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B9784M2-801867-5.00004-5 53
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
54 CHAPTER 4 Search the Web—Beyond Convention
META SEARCH
When we send a request to a regular search engine, it looks up into its own database
for the relevant results and presents them, hut what if we want to get results from mul¬
tiple search engines. This is where meta search engines comes in. What meta search
engines do is that they send the user’s query to multiple data sources, such as search
engines, databases, etc. at once and aggregates the results into a single interface.
This makes the search results more comprehensive and relevant and also saves the
time of searching multiple sources one at a time. Meta search engines do not create
a database of their own, but rely on various other databases for collecting the results.
Poly meta (http://www.polymeta.com/)
Polymeta is a great meta search engine which sends the search query to a wide range
of sources and then takes the top result from each one of them and further ranks them.
The search results of Polymeta not only contain the URLs but also its social network
likability through number of Facebook likes for that URL. We can further drill down
into the results through the search within feature which allows us to search for key¬
words inside the already aggregated results.
<- - G polymeta.com/searchresult.jsp?sc=1136&q=search+engine&un=anonymous
::: Apps CD People C] Company C] Social Media C] News C] Competitive Intellig... CD Search Engines CD Chrome Addons
Web News Images Videos Blogs
lEngii -
PolyMetavffl [search engine
Search I Search within] Clear
Select Sources
50 Results O for: search er
Topics i nr nr, i -
All Results (50)
5 search engine... (42)
o search engine
optimization (4)
o Search Engine
Marketing (4)
» Search Engine
Land (3)
▼ More
e Search the web (7)
e results (3)
e metasearch (3)
« Google (7)
e Guide (4)
e Bing (3)
e images (2)
Web Results 1 12131415| Next >
1 Ixqnick Search Eng ined
12.937 people like mis Sign Up to see what your friends we.
Ixquick search engine provides search results from over ten best search engines
wwwixquickcom Google
2 . Ppqp ilfi.
lili3 5,447 people ike this. Sign Up to see what your friends like.
InfoSpace metasearch engine offering search of the general web, or images,
www.dogpile com Google
3. Wck sea rch e ngi ne - Wi kipedia, the f ree enc y clopedia^
369 people like this. Sign Up to see what your friends like.
A web search engine is a software system that is designed to search for
en.wikipedia.org/wiki/Web_search_engine Google
4. DuckPucK OfiO.
103.347 people like this. Sign Up to see what your friends Ike.
The search engine that doesn’t track you. ... Involved: Community ■ Feedback
duckduckgo.com Google
5 Yahoo Search - Web Search
7.608 people Ike this Sign Up to see what your friends like.
The search engine that helps you find exactly what you're looking for. Find the
search.yahoo.com Google
News
Chinese search engine Baidu
goes
live in Brazil
Chinese search engine Baidu has
finally started to operate in Brazil on
Thursday, nearly two years after its
developer set up an office in the
www.zdneLcom/chinese-search-
engme-baidu-goes-lrve-in-brazil-
7000031771
H
Google.
YaHo°'V>
a>.
C ing / + i
Abut I 1
MoreN^j
m
FIGURE 4.1
Meta search engine—Polymeta.
Polymeta categorizes the results into topics, which are displayed inside a panel
on the left; results for news, images, videos, blogs are displayed in separate panels
on the right. It also allows us to select the sources from a list for different categories.
Copyrighted material
PA55
Introduction
Ixquick (https-y/www.ixquick.com)
Ixquick is another meta search engine and in its own words is “the world’s most
private search engine.” Apart from its great capability to search and present
results from various sources it also provides a feature to use Ixquick proxy to
access the results. In the search results itself, below every result there is an option
named as “proxy,” clicking on which will take us to the result URL but through
the proxy (https://ixquick-proxy.com), which allows us as a user to maintain our
anonymity.
Apart from the regular web, images, and video search, Ixquick provides a unique
search capability, i.e., phone search. We can not only search for the phone number
of people but can also do a reverse phone search. It means that we need to provide
the phone number and choose the country code and it will fetch the information of
the owner. Not only this, this phone search functionality also allows us to search for
phone numbers of businesses, we simply need to provide the business name and loca¬
tion details. Ixquick also provides advanced search, which can be accessed by the
following URL https://www.ixquick.com/eng/advanced-search.htmL
FIGURE 4.2
Ixquick phone search.
Mam ms (http://mamma.com/)
Mamma is yet another meta search engine. Similar to any meta search engine it also
aggregates its results from various sources, but that is not all what makes it stand out.
The clean and simple interface provided by Mamma makes it very easy to use even
55
Copyrighted material
56 CHAPTER 4 Search the Web—Beyond Convention
for a first time user. The result page is clean and very elegant. We can access various
categories such as news, images, video, etc. through simple tabs which are integrated
into the interface itself once used. Clicking on the Local button allows us to get the
region specific results.
The tabulation feature we discussed not only creates different tabs for categories
but also for different queries which allows us to access results from previous search
easily.
PEOPLE SEARCH
Now we have a fair understanding of how meta search works, let’s move on to
learn how to lookout for people online. There are many popular social media plat¬
forms like Facebook (facebook.com), Linkedln (linkedin.com), etc. where we can
find out a lot about people, here we will discuss about search engines which index
results from platforms like these. In this section we will learn how to search for
people online and find related information. The information we expect from this
kind of engagements is full name, e-mail address, phone number, address, etc.; this
all information can be used to extract further information. This kind of informa¬
tion is very relevant when we require information about person to perform a social
engineering attack for an InfoSec project or need to understand the persona of a
potential client.
Spokeo (http://www.spokeo.com)
When it comes to searching people, especially in the US no one comes close to this
people search engine. Though most of the information provided by it is now paid
as opposed to its previous versions, but speaking from past experience it is a great
platform which provides a variety of information related to a person ranging from
basic information such as name, e-mail, address to information like neighborhood,
income, social profiles, and much more. It allows to search people by name, e-mail,
phone, username, and even address. The price package of the information provided
by it seems reasonable and is recommended for anyone who deals with digging infor¬
mation about people.
Pi pi (https://pipl.com/)
Pipl is a great place to start looking for people. It allows us to search using name,
e-mail, phone number, and even username. The search results can be further refined
by providing a location. Unlike most search engines which crawl through the surface
web only, Pipl digs the deep web to extract information for us (concept of deep web
will be discussed in detail in a later chapter), this unique ability allows it to provide
results which other search engines won’t be able to. The results provided are pretty
comprehensive and are also categorized into sections such as Background, Profiles,
Public Records, etc. The results can be filtered based upon age also. All in all it is one
of the few places which provide relevant people search results without much effort
and hence must be tried.
Copyrighted material
PA57
Introduction
FIGURE 4.3
Searching people using Pipl.
PeekYou (http://www.peekyou.com/)
Peek You is yet another people search engine which not only allows to search using
the usual keywords types such as name, e-mail, username, phone, etc. but also using
terms of the type interests, city, work, and school. These unique types make it very
useful when we are searching for alumni or coworkers or even people from past with
whom we lived in the same city. The sources of information it uses are quite wide and
hence the results and the best part is it’s all free.
Yasni (http-J/www.yasni.com/)
Yasni is a tool for people who want to find people with specific skill sets. It not
only allows us to search people by their name but also by the domain they spe¬
cialize in or profession. The wide range of categories of results provided by Yasni
makes it easy to find the person of interest. Some of the categories are images,
telephone and address, interests, business profile, interests, documents, and much
more. This platform provides a one stop shop for multiple requirements related
to searching people online.
LittleSis (httpS/littlesis. org/)
LittleSis is exactly not a general people search, but is more focused on people at
the top of the business and political food chain, so searching for common people
here would be a waste of time. Although it is good at what it does and can reveal
57
Copyrighted material
58 CHAPTER 4 Search the Web—Beyond Convention
interesting and useful information about business tycoons and political czars. Apart
from the basic information such as introduction, DoB, sex, family, friends, educa¬
tion, etc., it also shows information like relationships, which lists down the positions
and memberships that the person holds or ever held; interlocks, which lists people
with position in the same organizations, etc. It is a good place to research about
people with power and who they are associated with.
MarketVisual (http-J/www. marketvisual. com/)
MarketVisual is also a specialized search engine which allows us to search for pro¬
fessionals. We can search for professionals by their name, title, or company name.
Once the search is complete it presents a list of entities with associated information
such as number of relationships, title, and company. The best part about Marketvi¬
sual is the visualization it creates of the relationships of the entity once we click on
it. These data can further be downloaded in various forms for later analysis. It is a
great tool for market research.
FIGURE 4.4
Marketvisual displaying connection graph.
They Rule (http://theyrule.net/)
Similar to MarkctVisual, TheyRule also provides the medium to search profes¬
sionals across top global corporates. First look at the interface, it makes us doubt
if there is actually any information, as there is a small list of links on the top left
Copyrighted material
Introduction
59
corner, that too in smaller than average font size; but once we start to explore
these links we can find an ocean of interesting information. Clicking on the
companies link provides a huge list of companies, once we click on a company
it will present a visual representation of it. Hovering over this icon provides
option to show directors and research further. The directors are further repre¬
sented through the visualization. If any director is on more than one boards,
then hovering over his/her icon provides the option to show that as well. It also
provides an option to find connections between two companies. Apart from this
it also lists interesting maps created by other such as Too Big To Fail Banks and
also lets us save ours.
BUSINESS/COMPANY SEARCH
Today almost every company has an online presence in the form of a website, one or
more social media profile, etc. These mediums provide a great deal of information
about the organization they belong to, but sometimes we need more. Be it research¬
ing a competitive business, potential client, potential partner, or simply the organi¬
zation where we applied for an opening, there are platforms which can help us to
understand them better. Let’s learn about some of them.
Linked In (httpsS/www. linked in. com/vsearch/c)
Linkedln is one of the most popular professional social media website. We
have already discussed about Linkedln search in a previous chapter, but when it
comes to searching about companies we simply can’t ignore it. Most of the tech
savvy corporates do have Linkedln profiles. These profiles list some interesting
information which is usually not found on corporate websites, such as company
size, their type, and specific industry. It also shows the number of employees
of the company who have a profile on the platform. We can simply see the list
of these employees and check their profiles, depending upon who/what we are
looking for. Apart from this we can also see regular updates from the company
on their profile page and understand what they are onto. It also allows us to fol¬
low companies using a registered account so that we can receive regular updates
from them.
Glassdoor ( http J/www.glassdoor ; com/Reviews/index.htm)
Glassdoor is a great platform for job seekers but it also provides a huge amount
of relevant information on companies. Apart from the usual information such as
company location, revenue, competitors, etc. we can also find information such as
employee review, salary, current opportunities as well as interview experiences. The
best part is that the information is provided not just by the organization itself but
also its employees, hence it provides a much clear view of the internal structure and
working. Similar to Linkedln, Glassdoor also provides an option to follow company
profiles to receive updates.
Copyrighted material
60 CHAPTER 4 Search the Web—Beyond Convention
FIGURE 4.5
Glassdoor company search interface.
Zoominfo (http://www.zoominfo.com/)
Zoominfo is a busincss-to-busincss platform which is mainly used by sales and mar¬
keting representatives to find details about companies as well as people working in
them, such as e-mail, phone number, address, relationships, etc. Though the free
account has various limitations yet it’s a great tool to find information about organi¬
zations and their employees.
REVERSE USERNAME/E-MAIL SEARCH
Now as we learned how to extract information related to people and companies, let’s
take this a step further and see what other information we can extract using the user-
name of a person, which in most cases is the e-mail address of the person.
EmailSherlock (http://www.emailsherlock.com/)
EmailSherlock is a reverse e-mail search engine. What it does is that once we provide
an e-mail address to it, it looks up if that e-mail has been used to register an account
on a wide range of websites, mostly social media and gets us the results in form of
any information it can extract from these platforms. This kind of information can
be very helpful in case we just have the e-mail address of the person of interest. Once we
know the platform on which this particular person is registered, we can go ahead
and create an account on it and might be able to extract information which we were
not allowed to access otherwise. Similar to EmailSherlock there is another service
Copyrighted material
PA61
Introduction
called as UserSherlock (http://www.usersherlock.com/) which does the same thing
for usernames.
Though the results provided by these services arc not 100% accurate, yet they
provide a good place to start.
S' C Q www.emailsherlock.com & =
Apps Q People □ Company Q Social Media Q News Q Competitive Intellig... f^l Search Engines f~l Chrome Addons
SfTweet 24 Q^^{k4k Q 1 Patous fan 5 |$ Buffer| 4 ©Pont S Email Share
16
M
EMAIL^
SHERLOCK
7,608,853 searches so far
GO
Recent searches: anabeim_l2@ivecom dcorrall994@yahoocom pauidooQiaskey@gmaii com Wackrain23rd@gmaii com don hoopteacher@gmali com
Why I need EmailSherlock's email search?
A reverse email search conducted al EmaliShertock com can help delermne the identity of the owner of an
unknown address that shows up in your inbox You can also use this free email search service to leam more about
an address you found in your address book or perhaps in connection with an online ad you're considering
FIGURE 4.6
EmailSherlock interface.
CheckUsernames (http-J/checkusernames. com/)
Similar to UserSherlock, CheckUsernames also runs the username provided to it through
a huge list of social media websites and check if that username is available on them or not.
Namechk (http-J/namechk.com/)
Like CheckUsernames and UserSherlock, Namechk also checks the availability of
the provided username on a huge list of social media sites.
KnowEm ( http^/kno wem. com/)
The website discussed above (checkusernames.com) is powered by KnowEm and
similarly it can be used to check for usernames, but it additionally checks for domain
names as well as trademark.
Face book (https://www.facebook.com/)
Unlike most of the social network sites, Facebook allows us to search people using
e-mail addresses and being one of the largest social networks it can be very helpful
when searching people online.
61
Copyrighted material
62 CHAPTER 4 Search the Web—Beyond Convention
SEMANTIC SEARCH
In chapter 2, we discussed about semantic web and how it will be an integral part of
the web of the future. Let’s get familiar with some of the semantic search engines
and see how mature they are.
DuckDuckGo (https://duckduckgo.com)
Though the name DuckDuckGo may sound a bit odd for a search engine but the
search results provided by it are quite amazing. This new kid on the block is slowly
challenging the search giant Google based on its unique selling proposition (USP), i.e.,
it does not track its users. The search results provided by it are very relevant minus the
clutter. There are not many ads and sidebars to fill up the space. It provides meaning for
the query which help the user to select the one of his/her intention and get the results
accordingly. Similar to Google it also provides the answers to mathematical queries and
even provides answers for queries like weather with the weather for our location. The
definition tab simply provides the dictionary meaning of the keyword supplied. The bar
under the query box is very relevant and provides the categories for topics. It is popu¬
lated depending upon the search query, such as searching for a music band populates
it with related videos, whereas searching for Thailand beaches will display images of
the beaches, it also responds to queries like what rhymes with you with relevant results.
The rapid growth and incredible features make it a real competition for major search
engines like Google, Bing, and Yahoo and is slowly gaining the recognition it deserves.
It is a must try for anyone who is enthusiastic about new ways of exploring the web.
i i i r
* @ beatels at DuckDuckGo _
<- -J' C S httpsy/duckduckgo.com/?q=beatels&t=chromiumportable =
::: Apps CD People CD Company CD Social Media □ News C] Competitive Intellig... CD Search Engines Q Chrome Addons
(The Beaties) '20 Greatest H... The Beateis • I Feel Fine The Beatels • Yesterday Beatels ■ Don’t Let Me Down The Beat els.
O 6.559.099 views <t> 571.631 views €> 10.981 views <£> 15.863 views <•> 127.784 v
Did you mean Beatles?
Shop audio now
Great prices at Yahoo Shopping. Shop Beatles The today!
shopping.yahoo.com/audio Sponsored link
The Beatles - Wikipedia, the free encyclopedia
The Beatles were an English rock band that formed in Liverpool, in 1960. With John Lennon, Paul
McCartney. George Harrison and Rlngo Starr, they became widely regarded as the greatest and most
influential act of the rock era. Rooted in skiffle, beat and 1950s rock and roll, the Beatles later...
W en.wikipedia.org
FIGURE 4.7
DuckDuckGo results.
Copyrighted material
Introduction 63
Kngine (httpyYkngine.com/)
Kngine is a great search engine with semantic capabilities. Unlike conventional
search engines it allows us to ask questions and tries to answer it. We can input que¬
ries like “who was the president of Russia between 1990 and 2010” and it presents
us with a list containing the names, images, term years, and other details related to
Russia. Similarly searching for “GDP of Italy” gives a great amount of relevant infor¬
mation in form of data and graphs minus the website links. So next time a questions
pops up in our mind we can surely give it a try.
FIGURE 4.8
Kngine result for semantic query.
SOCIAL MEDIA SEARCH
Social media is a vast platform and its impact is also similar, be it on personal level or
corporate level. Previously we discussed about social media and also how to search
through some specific social network platforms, now let’s check out some of the
social media search engines and their capabilities.
SocialMention (http://socialmention. com/)
So what SocialMention provides is basically real-time social media search and anal¬
ysis, but what does it mean. Let’s break it up into two parts search and analysis.
As for the search part, SocialMention searches various social media platforms like
blogs, microblogs, social networks, events, etc. and even through the comments. The
Copyrighted material
64 CHAPTER 4 Search the Web—Beyond Convention
results provided can be sorted by date and source and can be filtered for timelines
like last hour, day, week, etc., apart from this, SocialMention also provides advanced
search option, using which we can craft queries to get more precise results. Unlike
conventional search engines, searching through social media specifically has a huge
advantage, which is to be able to understand the reach and intensity of terms we are
searching in the content created by people. Through this we can have a better under¬
standing how people relate to these terms and upto what level.
Now let’s move on to the analysis part, SocialMention not only provides the search
results for our queries but also indicates the level of sentiments associated with it. It also
displays the level of strength, passion, and reach of our query terms in the vast ocean
of social media. Apart from this we can also see the top keywords, users, hashtags, and
sources related to the query. One of the best features provided by this unique platform
is that we can not only see this information, but also download it in the form of a CSV
files. If all this was not sufficient, SocialMention also allows us to setup e-mail alerts for
specific keywords. The kind of information this platform provides is not only helpful for
personal use but can also have a huge impact for businesses as well; we can check how
our brand is performing in the social arena and respond to it accordingly.
FIGURE 4.9
SocialMention displaying results and associated statistics.
Social Searcher (http://www.social-searcher.com/)
Social Searcher is yet another social media search engine. It uses Facebook, Twitter and
Google+ as its sources. The interface provided by this search engine is simple. Under
the search tab the search results are distributed into three tabs based on the source.
Copyrighted material
PA65
Introduction
under these tabs the posts arc listed with a preview, which is very helpful in identify¬
ing the ones relevant for us. Similar to SocialMention we can setup e-mail alerts also.
Under the analytics tab we can get the sentiment analysis, users, keywords,
domains, and much more. One of the interesting of these is the popular tab which
lists the results with more interaction such as likes, retweets, etc.
TWITTER
Twitter is one of the most popular social networking sites with huge impact. Apart
from its usual functionality to microblog, it also allows to understand the reach and
user base of any entity which makes it a powerful tool for reconnaissance. Today it is
widely used for market promotion as well as analyze the social landscape.
Topsy (http J/topsy. com/)
Topsy is a tool which allows us to search and monitor Twitter. Using it we can check
out the trend of any keyword over Twitter and analyze its reach. The interface is
pretty simple and looks like a conventional search engine, just the results are only
based on Twitter. The results presented by it can be narrowed down to various time-
frames such as 1 day, 30 days, etc. We can also filter out the results to only see the
images, tweets, links, videos, or influencers. There is another filter which allows us to
see only results containing results from specific languages. All in all Topsy is a great
tool for market monitoring for specific keywords.
FIGURE 4.10
Topsy search.
65
Copyrighted material
66 CHAPTER 4 Search the Web—Beyond Convention
Trendsmap (httpJ/tr endsmap.com/)
Trendsmap is a great visual platform which shows trending topics in the form of key¬
words, hashtags, and Twitter handles from the Twitter platform over the world map.
It is great platform which utilizes visual representation of the trends to understand
what’s hot in a specific region of the world. Apart for showing this visual form of
information it also allows us to search through this information in the form of a topic
or a location which makes it easier for us to see only what we want.
Tweetbeep (http://tweetbeep.com/)
In its own words, Tweetbeep is like Google alerts for Twitter. It is a great service
which allows us to monitor topics of interest on Twitter such as a brand name, prod¬
uct, or updates related to companies and even links. From market monitoring purpose
it’s a great tool which can help us to quickly respond to topics of interest.
Twiangulate (httpS/twiangulate. com/search)
Twiangulate is a great tool which allows us to perform Twitter triangulations. Using
it we can find who are the common people who are followers of and are followed
by two different twitter users. Similarly it also provides the feature to compare the
reach of two users. It is great tool to understand and compare the influence of differ¬
ent Twitter users.
SOURCE CODE SEARCH
Most of the search engines we have used only look for the text visible on the web
page, but there are some search engines which index the source code present on the
internet. These kind of search engines can be very helpful when we are looking for
specific technology used over the internet, such as a content management system
like WordPress. Utilities of such search engines are for search engine optimization,
competitive analysis, keyword research for marketing and are only limited by the
creativity of the user.
Due to the storage and scalability issues earlier there were no service providers in
this domain, but with technological advancements some options are opening up now,
let checkout some of these.
NerdyData (http://nerdydata.com)
NerdyData is one of the first of its kind and unique search engine which allows us to
search the code of the web page. Using the platform is pretty simple, go to the URL
https://search.nerdydata.com/, enter the keyword like WordPress 3.7 and NerdyData
will list down the websites which contain that keyword in their source code. The
results not only provide the URL of the website but also shows the section of the
code with the keyword highlighted under the section Source Code Snippet. Apart
from this there are various features such as contact author, fetch backlink, and others
which can be very helpful but most of these are paid, yet the limited free usage of
NerdyData is very useful and is worth a try.
Copyrighted material
PA67
Introduction
FIGURE 4.11
NerdyData code search results.
Ohloh code (https J/code. ohloh.net)
Ohloh code is another great search engine for source code searching, but it’s
a bit different in terms that it searches for open source code. What this means
is that its source of information is the code residing in open space, such as Git
repositories.
It provides great options to filter out the results based on definitions, languages
(programming), extensions, etc. through a bar on the left-hand side titled “Filter
Code Results.”
Se arc he ode (https://searchcode.com)
Similar to Ohloh, Searchcode also uses open source code repositories as its informa¬
tion source. The search filters provided by Searchcode are very helpful, some of them
are repository, source, and language.
TECHNOLOGY INFORMATION
In this special section of search engines we will be working on some unique search
engines which will help us to gather information related to various different technol¬
ogies and much more. In this segment we will be heavily dealing with IP addresses
and related terms, so it is advised to go through the section “Defining the basic terms”
in the first chapter.
67
Copyrighted material
68 CHAPTER 4 Search the Web—Beyond Convention
Whois (http://whois.net/)
Whois is basically a service which allows us to get information about the registrant
of an internet resource such as a domain name. Whois.net provides a platform using
which we can perform a Whois search for a domain or IP address. A whois record
usually consists of registrar info; date of registration and expiry; registrant info such
as name, e-mail address, etc.
Robtex (http://www.robtex.com)
Robtex is great tool to find out information about internet resources such as IP
address. Domain name. Autonomous System (AS) number, etc. The interface is
pretty simple and straightforward. At the top left-hand corner is a search bar using
which we can lookup information. Searching for a domain gives us related informa¬
tion like IP address, route, AS number, location, etc. Similarly other information is
provided for IP addresses, route, etc.
W3dt (https://w3dt.net/)
W3dt is great online resource to find out networking related information. There arc
various section which we can explore using this single platform. The first section is
domain name system (DNS) tools which allows us to perform various DNS-related
queries such as DNS lookup, reverse DNS lookup, DNS server fingerprinting, etc.
Second section provides tools related to network/internet such as port scan, tracer-
oute, MX record retriever, etc. The next section is web/HTTP which consists of tools
such as SSL certificate info, URL encode/decode, HTTP header retrieval, etc., then
comes the database lookups section under which comes MAC address lookup, Whois
lookup, etc., in the end there are some general and ping-related tools. All in all it is
great set of tools which allows to perform a huge list of different useful functions
under single interface.
Shodan (http://www.shodanhq.com/)
So far we have used various types of search engines which help us to explore the web
in all different ways. What we haven’t encountered till now is an internet search engine
(remember the difference between web and internet explained in chapter 1) or simply
said a computer search engine. Shodan is a computer search engine which scans the
internet and grabs the service banner based on IP address and port. It allows us to search
this information using IP addresses, country filters, and much more. Using it we can
find out simple information such as websites using a specific type of web server such as
Internet Information Services (IIS) or Apache and also information which can be quite
sensitive such as IP cameras without authentication or SCADA systems over internet.
Though the free version without registration provides very limited information,
which can be mitigated a bit using a registered account, yet it is sufficient enough to
understand the power of this unique search engine. We can utilize the power of this
tool through browser add-on or through its application programming interface also.
Shodan has a very active development history and comes up with new features all the
time, so we can expect much more from it in the future.
Copyrighted material
PA69
Introduction
W <3, SHODAN • Computer Sec x ^ ’
'
"SfisTT 2 T
<- C Q www.shodanhq.com/search?q=+port%3A21
☆ = 1
11 i:i Appi CD People CD Company CD Social Media Q] News C] Competitive Intcllig... C] Search Engines CD Chrome Addons
Shodan Exploits Scanhub Maps Blog Anniversary Promotion
SHODAN
Top Countries
United States
China
Germany
Thailand
Italy
5,181.467
1,975,627
1,725.496
1,281.427
1,157,200
188.143.91.138
Dtd Ltd.
AOSedOft 18,07 2014
Results 1 • 10 Ot about 29278560 for port:21
91.206.251.5
Peter 6. J. Ourieux
Added on 18.07.20H
II
S1-20C-2S1 •S-powered-by.dtgdeuvefi.be
220PrcFTPD 1J.4»Str.tr (wVjteadipleu-.tabe) 91.206.2? 1.51
530 Login sownect.
314-Tb* follovnaj coxnnuaii »e recopuzed (• ■
214-CWD XC1VD COUP XCUP SMNT* QUIT PORT PASV
214.EPRT EPSV ALLO* RNFRRNTO DELE MDTM RMD
214-XRMD MKD XMKD PVVD XPWD SIZE SYST HELP
214-NOOP FEAT OPTS AUTH* CCC* CONF* ENC* MIC*
214-P3SZ* PROT* TYPE 8TKU MODE RETR STOR...
220 Akamai Content Storage FT? Str.tr
530 Login incorrect.
214-Ti>» following comman4i are rtccjnirtd (* »•'» onunplemtnted)
OVD XCWD COUP XCUP SMNT* QUIT PORT PASV
EPRT EPSV ALLO* RNFR RNTO DELE MDTM RMD
XRMD MKD XMKD PUD XPWD SIZE SYST HELP
NOOP FEAT OPTS AUTH CCC* CONF* ENC* MIC*
PBSZ PROT TYPE STRU MODE RETR STOR STOV
APPE REST ABOR USER PASS.
220 FTP Server (ZyWALL USG 300) [: «F1SS14J.91.13S]
530 Login incorrect
re recognized (• * \ uuscplemented):
Hurricane
LABS
Celebrating 3
years of
Shodan
FIGURE 4.12
Shodan results for port 21.
WayBack Machine (http://archive.org/web/web.php)
Internet Archive WayBack Machine is great resource to lookup how a website looked
in past. Simply type the website address into the search bar and it will return back
a timeline with the available snapshot highlighted on the calendar. Simply hovering
over these highlighted dates over calendar will present a link to the snapshot. This is
great tool to analyze how a website has evolved and thus monitor its past growth. It
can also be helpful to retrieve information from a website which was available in the
past but is not now.
REVERSE IMAGE SEARCH
We all are familiar with the phrase “A picture is worth a thousand words” and its verac¬
ity and are also aware of platforms like Google Images (http://images.google.com),
Flickr (https://www.flickr.com/), Deviantart (http://www.deviantart.com/), which
provides us images for keywords provided. Usually when we need to lookup some
information, we have a keyword or a set of them in the form of text, following the
same lead the search engines we have dealt with till now take text as an input and
get us the results, but in case we have an image and we want to see where it appears
on the web, where do we go? This is where reverse image search engines come in,
which take image as an input and looks up to find its web appearance. Let’s get
familiar with some of these.
69
Copyrighted material
70 CHAPTER 4 Search the Web—Beyond Convention
Google Images (http://images.google.com/)
We all are aware that Google allows us to search the web for images, but what many
of us are unaware of is that it also allows to perform a reverse image search. We
simply need to go to the URL http://images.google.com and click on the camera icon
and provide the URL of the image on the web or upload a locally stored image file,
we can also drag and drop an image file into the search bar and voila Google comes
up with links to the pages containing that or similar images on the web.
^ C S https://www.google.com/search?tbs-sbi A^.1^lZZisHUQIGAUWxBSx8MORWv-kcrmPkVnKdgcTafQ0lL_la70QpM3OqpoeDrv , u ,
::: A PP* CD People CD Company Qj Social Media Q New* CD Competitive Intellig — CD Search Engines CD Chrome Addon*
Google |B mona-iisa.jpg | mona lisa leonardo da vinci
sa $
Web Images News Shopping Maps More * Search tools
About 1,140 results (1.27 seconds)
mage si
>83$ * i
i
Image size:
2835«4289
Find other sizes of this image:
All sizes • Medium • Large
Best guess for this image mona lisa leonardo da vinci
Mona Lisa - Wikipedia, the free encyclopedia
en wikipedia org/wiki/Mona_Lisa w
The Mona Lisa (Monna Lisa or La Gioconda in Italian, La Joconde in French) is a half-
length portrait of a woman by the Italian artist Leonardo da Vinci, which
Leonardo's "Mona Lisa" - Smarthistory
smarthistory.khanacademy.org/leonardo-mona-lisa.hlml w
Smarthistory conversation about one of art history's most famous paintings. Leonardo da
Vinci's Mona Lisa
Visually similar images
Report images
Mona Lisa
Artwork
The Mona Lisa is a half-
length portrait of a woman
by the Italian artist
Leonardo da Vinci, which
has been acclaimed as "the
best known, the most
visited, the most written about, the most
sung about, the most parodied work of art
in the world " Wikipedia
Artist: Leonardo da Vinci
Location: The Louvre (since 1797)
Subject: Lisa del Giocondo
Created: 1503-1517
Dimensions: 2' 6" x V ST (77 cm x 53 cm)
Periods: Italian Renaissance. The
Renaissance
Pg~"‘g r/-K 4 nr
FIGURE 4.13
Google reverse image search.
Tin Eye ( https://www. tineye. com/)
TinEye is another reverse image search engine and has a huge database of images.
Similar to Google images, searching on TinEye is very simple, we can provide the
URL to the image, upload it, or perform a drag and drop. TinEye also provides
browser plugin for major browsers, which makes the task much easier. Though the
results of TinEye are not as comprehensive as Google images, yet it provides a great
platform for the task and must be tried.
I mage Raider (http-J/www. I mage Raider, com/)
Last but not the least in this list is ImageRaider. ImageRaider simply lists the
results domain wise. If a domain contains more than one occurrence of the
Copyrighted material
PA71
Introduction
image then it also tells that and the links to those images are listed under the
domain name.
Reverse image search can be very helpful to find out more about someone when
we are hitting dead-ends using conventional methods. As many people use same
profile picture for various different platforms, making a reverse image search can
lead us to other platforms where the use has created a profile and also has previously
undiscovered information.
MISCELLANEOUS
We dealt with a huge list of search engines which are specialize in their domain and
are popular among a community. In this section we will be dealing with some differ¬
ent types of search platforms which are lesser known but serve unique purposes and
are very helpful in special cases.
Data Market (http://datamarket.com/)
DataMarket is an open portal which consists of large data sets and provides the data
in a great manner through visualizations. The simple search feature provides results
for global topics with list of different visualizations related to the topic, for example,
searching for the keyword gold would provide results such as gold statistics, import/
export of gold, and much more. The results page consists of a bar on the left which
provides a list of filters using which the listed results can be narrowed down. It also
allows us to upload our own data and create visualization from it. Refer to the link
http://datamarket.com/topic/list/ for a huge list of topics on which DataMarket pro¬
vides information.
WolframAlpha (http://www. wolframalpha.com/)
In this chapter we learned about various search engines which take some value as
input and provide us with the links which might contain the answer to the questions
we are actually looking for, but what we are going to learn about now is not a search
engine but a computational knowledge engine. What this means is that it takes our
queries as input but does not provides with the URLs to the websites containing the
information, instead it tries to understand our natural language queries and based
upon an organized data set, provides a factual answer to them in form of text and
sometimes apposite visualization also.
Say, for example, we want to know the purpose of .mil domain, so we can sim¬
ply type in the query “what is the purpose of the .mil internet domain?” and get
the results, to get the words starting with a and ending with e, a query like “words
starting with a and ending with e” would give us the results, we can even check the
net worth of Warren Buffett by a query like “Warren Buffett net worth.” For more
examples of the queries of various domains that WolframAlpha is able to answer,
checkout the page http://www.wolframalpha.com/cxamples/.
71
Copyrighted material
72 CHAPTER 4 Search the Web—Beyond Convention
FIGURE 4.14
WolframAlpha result.
A ddictomatic ( http://addictomatic. com)
Usually we visit various different platforms to search information related to a topic,
but addictomatic aggregate various news and media sources to create a single dash¬
board for any topic of our interest. The content aggregated is displayed in various
sections depending upon the source. It also allows us to move these sections depend¬
ing upon our preference for better readability.
Carrot2 ( http -.//search. carrot2. org/stable/search)
Carrot2 is a search results clustering engine, what this means is that it takes
search results from other search engines and organizes these results into topics
using its search results clustering algorithms. Its unique capability to cluster
the results into topics allows to get a better understanding of it and associ¬
ated terms. These clusters are also represented in different interesting forms
such as folders, circles, and FoamTree. Carrot2 can be used through its web
interface which can be accessed using the URL http://search.carrot2.org/
and also through a software application which can be downloaded from
http://project.carrot2.org/download.html.
Copyrighted material
Introduction 73
FIGURE 4.15
Carrot2 search result cluster.
Board reader (httpJ/boardreader.com/)
Boards and forums are rich source of information as a lot of interaction and Q&A goes
on in places like this. Members of such platforms range from newbies to experts in the
domain to which the forum is related to. In places like this we can get answers to questions
which arc difficult to find elsewhere as they purely comprise of user-generated content,
but how do we search them? Here is the answer Boardreader. It allows us to search
forums to get results which contains content with human interaction. It also displays a
trend graph of the search query keyword to show the amount of activity related to it. The
advance search features provided by it such as sort by relevance, occurrence between
specific dates, domain-specific search, etc. adds to its already incredible features.
Omgili (http-J/omgUi. com/)
Similar to Boardreader, Omgili is also a forum and boards search engine. It displays
the results in the form of broad bars and these bars contain information such as date,
number of posts, author, etc. which can be helpful in estimating the relevance of the
result. One such information is Thread Info, which provides further information about
a thread such as forum name, number of authors, and replies to the thread, without
actually visiting the original thread forum page. It also allows us to filter the results
based upon the timeline of their occurrence such as past month, week, day, etc.
Copyrighted material
74 CHAPTER 4 Search the Web—Beyond Convention
True cal ter (http://www. true caller com)
Almost everyone who uses or has ever used a smartphone is familiar with the concept
of mobile applications, better known as apps and many if not most of them have used
the famous app called Truecaller which helps to identify the person behind the phone
number, what many of us are unaware of is that it can also be used through a web
browser. Truecaller simply allows us to search using a phone number and provides
the user’s details using it’s crowdsourced database.
Other search engines worth trying:
• Meta search engine
• Search (http://ww w.search.com/)
• People search
• ZabaSearch (http://ww'w.zabasearch.com/)
• Company search
• Hoovers (http://ww w.hoovers.com/)
• Kompass (http://kompass.com/)
• Semantic
• Sensebot (http://w w w.sensebot.net/)
• Social media search
• Whostalkin (http://www.whostalkin.com/)
• Twitter search
• Mcntionmapp (http://mcntionmapp.com/)
• SocialCollider (http://socialcollider.net/)
• GeoChirp (http://www.geochirp.com/)
• Twittcrfall (http://bcta.twittcrfall.com/)
• Source code search
• Meanpath (https://meanpath.com)
• Technology search
• Nctcraft (http://www.netcraft.com/)
• Scrvcrsniff (http://scrvcrsniff.net)
• Reverse image search
• NcrdvData image search (https://search.nerdydata.com/images)
• Miscellaneous
• Freebase (http://www.freebase.corn/)
So we discussed a huge list of various search engines under various categories
which arc not conventionally used but as we have already seen these are very useful
in different scenarios. We all are addicted to Google for all our searching needs and it
being one of the best in its domain has also served our purpose most of the time, but
sometimes we need different and specific answers to our queries, then we need these
kind of search engines. This list tries to cover most of the aspects of daily searching
needs, yet surely there must be other platforms which need to be find out and used
commonly to solve specific problems.
In this chapter we learned about various unconventional search engines, their
features, and functionalities, but what about the conventional search engines like
Google, Bing, Yahoo, etc. that we use on daily basis. Oh! we already know how to
Copyrighted material
Introduction
75
use them or do we? The search engines we use on daily basis have various advanced
features which many of the users are unaware of. These features allows users to filter
out the results so that we can get more information and less noise. In the next chapter
we will be dealing with conventional search engines and will learn how to use them
effectively to perform better search and get specific results.
Copyrighted material
PA76
This page intentionally left blank
Copyrighted material
Advanced Web Searching
5
INFORMATION IN THIS CHAPTER
• Search Engines
• Conventional Search Engines
• Advanced Search Operators of various Search Engines
• Examples and Usage
INTRODUCTION
In the last chapter we dealt with some special platforms which allowed us to per¬
form domain-specific searches; now let’s go into the depths of conventional search
engines which we use on daily basis and check out how we can utilize them more
efficiently. In this chapter, basically, we will understand the working and advanced
search features of some of the well-known search engines and see what all function¬
alities and filters they provide to serve us better.
So we already have a basic idea about what search engine is, how it crawls over
the web to collect information, which are further indexed to provide us with search
results. Let’s revise it once and understand it in more depth.
Web pages as we see them are not actually what they look like. Web pages basi¬
cally contain HyperText Markup Language (HTML) code and most of the times
some JavaScript and other scripting languages. So HTML is basically a markup lan¬
guage and uses tags to structure the information, for example the tag <hlx/hl>
is used to create a heading. When we receive this HTML code from the server, our
browsers interpret this code and display us the web page in its rendered form. To
check the client-side source code of a web page, simply press Ctrl+U in the browser
with a web page open.
Once the web crawler of a search engine reaches a web page, it goes through
its HTML code. Now most of the times these pages also contain links to other
pages, which are used by the crawlers to move further in their quest to collect
data. The content crawled by the web crawler is then stored and indexed by
search engine based on variety of factors. The pages are ranked based upon their
structure (as defined in HTML), the keywords used, interlinking of the pages,
media present on the page, and many other details. Once a page has been crawled
and indexed it is ready to be presented to the user of the search engine depending
upon the query.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-8018fi7-5.00005-7
Copyright © 2015 Elsevier Inc. All rights reserved.
78 CHAPTER 5 Advanced Web Searching
Once a page has been crawled, the job of the crawler does not finish for that page.
The crawler is scheduled to perfonn the complete process again after a specific time
as the content of the page might change. So this process keeps on going and as new
pages are linked they are also crawled and indexed.
Search engine is a huge industry in itself which helps us in our web exploration,
but there is another industry which depends directly on search engines and that is
search engine optimization (SEO). SEO is basically about increasing the rank of
a website/web page or in other words to bring it up to the starting result pages of a
search engine. The motivation behind this is that it will increase the visibility of that
page/site and hence will get more traffic which can be helpful from a commercial or
personal point of view.
Now we have a good understanding of the search engines and how they operate,
let’s move ahead and see how we can better use some of the conventional search
engines.
GOOGLE
Google is one of the most widely used search engines and is the starting point for
web exploration for most of us. Initially Google search was accessible through very
simple interface and provided limited information. Apart from the search box there
were some special search links, links about the company, and a subscription box
where we could enter our email to get updates. There were no ads, no different lan¬
guage options, no login, etc.
It’s not only the look and feel of the interface that has changed over the years but
also the functionalities. It has evolved from providing simple web links to the pages
containing relevant information to a whole bunch of related tools which not only
allow us to search different media types and categories but also narrow down these
results using various filters. Today there are various categories of search results such
as images, news, maps, videos, and much more. These plethora of functionalities
provided by Google today has certainly made our lives much easier and made the act
of finding information on the web a piece of cake. Still sometimes we face difficulty
in finding the exact information we are looking for and the main reason behind it is
not the lack of information but to the contrary the abundance of it.
Let’s move on to see how we perform Google search and how to improve it. So
whenever we need to search something in Google we simply think about some of the
keywords associated with it and type them into the search bar and hit Enter. Based
upon the indexing Google simply provides us with the associated resources. Now if
we want to get better results or filter the existing results based upon various factors,
we need to use Google advanced search operators. Let’s have a look at these opera¬
tors and their usage.
site:
It fetches results only for the site provided. It is very useful when to limit our
search to some specific domain. It can be used with another keyword and Google
Copyrighted material
PA79
Google
will bring back related pages from the site specified. For an information security
perspective it is very useful to find out different sub domains related to a particular
domain.
Examples: site:gov, site:house.gov
FIGURE 5.1
Google “site” operator usage.
inurl:
This operator allows looking for keywords in the uniform resource locator (URL) of
the site. It is useful to find out pages which follow a usual keyword for specific pages,
such as contact us. Generally, as the URL contains some keywords associated with
the body contents, it will help us to find out the equivalent page for the keyword we
are searching for.
Example: inurkhack
allinurl:
Similar to “inurl” this operator allows looking for multiple keywords in the URL. So
we can search for multiple keywords in the URL of a page. This also enhances the
chances of getting quality content of what we are looking for.
Example: allinurl:hack security
intext:
This operator makes sure that the keyword specified is present in the text of the page.
Sometimes just for the sake of SEO, we can find some pages only contain keywords
to enhance the page rank but not the associated content. In that case we can use this
79
Copyrighted material
80 CHAPTER 5 Advanced Web Searching
query parameter to get the appropriate content from a page for the keyword we are
looking for.
Example: intextihack
all intext:
Similar to the “intext” this operator allows to lookup for multiple keywords in the
text. As we discussed earlier the feature of searching for multiple keywords always
enhances the content quality in the result page.
Example: allintext:data marketing
intitle:
It allows us to restrict the results by the keywords present in the title of the pages
(title tag: <titlc>XYZ</title>). It can be helpful to identify pages which follow a con¬
vention for the title of the pages such as directory listing by the keywords “index of”
and most of the sites provide the keywords in the title for improving the page rank.
So this query parameter always helps to search for a particular keyword.
Example: intitle:blueoccan
allintitle:
This is the multiple keyword counterpart of “intitle” operator.
Example: allintitle:blueocean market
filetype:
This operator is used to find out files of a specific kind. It supports multiple file types
such as pdf, swf, km I, doc, svg, txt, etc. This operator comes handy when we are only
looking for specific type of files on a specific domain.
Example: filetype:pdf, site:xyz.com, filetype:doc
ext:
The operator ext simply stands for extension and it works similar to the filetype
operator.
Example: ext:pdf
define:
This operator is used to find out the meaning of the keyword supplied. Google returns
dictionary meaning and synonyms for the keyword.
Example: define:data
AROUND
This operator is helpful when we are looking for the results which contain two
different keywords, but in close association. It allows us to restrict the number
Copyrighted material
Google 81
of words as the maximum distance between two different keywords in the search
results.
Example: A AROUND(6) Z
AND
A simple Boolean operator which makes sure keywords on both the side are present
in the search results.
Example: data AND market
OR
Another Boolean operator which provides search results that contain either of the
keyword present on both the sides of the operator.
Example: data OR intelligence
NOT
Yet another Boolean operator which excludes the search results that contain the key¬
word followed by it.
Example: lotus NOT flower
tnt
This operator is useful when we need to search for the results which contain the
provided keyword in the exact sequence. For example we can search pages which
contain quotes or some lyrics.
Example: “time is precious”
This operator excludes the search results which contain the keyword followed by it
(no space).
Example: lotus -flower
*
This wildcard operator is used as a generic placeholder for the unknown term.
We can use this to get quotes which we partially remember or to check variants
of one.
Example: “* is precious”
This special operator is used to provide a number range. It is quite useful to enforce
a price range, time range (date), etc.
Example: japan volcano 1990..2000
Copyrighted material
82 CHAPTER 5 Advanced Web Searching
info:
The info operator provides information what Google has on a specific domain. Links
to different types of information are present in the results, such as cache, similar
websites, etc.
Example: info:elsevier.com
related:
This operator is used to find out other web pages similar to the provided domain. It
is very helpful when we are looking for websites which provide similar services to a
website or to find the competitors of it.
Example: related:elsevier.com
cache:
This operator redirects to the latest cache of the page that Google has crawled. In case we
don’t get a result for a website which was accessible earlier, this is a good option to try.
Example: cache:elsevier.com
Advanced Google search can also be performed using the page
http://www.google.com/advanced_search, which allows us to perform restricted
search without using the operators mentioned above.
Advanced Search
Find pages vwlh...
ail th«s« words
this exact word or phrase
arty of these words:
none of these words:
numbers ranging from:
Then narrow your resutts
by...
last update
site or domain:
terms appearing
FIGURE 5.2
To do this in the search box
J Type ere rrpwlsflt wonts: tricolor tee terrier
Put exact wires n qu:tes. “ret terrier*
Type 00 between li lie words you wort sisietuse
• F*« ?»;«»* the i*i$u»$«y«UMMCt
* Fatf p»j«* eubtohrt n « pettKwhr rtpion
*■ Fmj peget epdited wttwi the tne you specify
Google advanced search page.
Apart from the operators Google also provide some operations which allow us to
check information about current events and also perform some other useful things.
Some examples are:
Copyrighted material
Google 83
time
Simply entering this keyword displays the current time of the location we are resid¬
ing in. We can also use name of region to get its current time.
Example: time france
weather
This keyword shows the current weather condition of our current location. Similar to
“time” keyword we can also use it to get the weather conditions of a different region.
Example: weather Sweden
Calculator
Google also solves mathematical equations and also provides a calculator.
Example: 39*(9823-312)+44/3
Convertor
Google can be used to perform conversions for different types of units like measure¬
ment units, currency, time, etc.
Example: 6 feet in meters
This is not all, sometimes Google also shows relevant information related to
global events as and when they happen; for example, FIFA World Cup.
Apart from searching the web, in general, Google also allows us to search spe¬
cific categories such as images, news, videos, etc. All these categories, including web
have some common and some specific search filters of their own. These options can
simply be accessed by clicking on the “Search tools” tab just below the search bar.
We can find options which allow us to restrict the results based upon the country,
time of publish for web; for images there are options like the color of image, its type,
usage rights, etc. and similarly other relevant filters for different categories. These
options can be very helpful in finding the required information of a category as they
are designed according to that specific category. For example if we are looking for an
old photograph of something it is a good idea to see only the results which arc black
and white.
The operators we discussed are certainly very useful for anyone who needs to find out
some information on the web, but the InfoSec community has certainly taken it to next
level. These simple and innocent operators we just discussed are widely used in the cyber
security industry to find and demonstrate how without even touching the target system,
critical and compromising information can be retrieved. This technique of using Google
search engine operators to find such information is termed as “Google Hacking.”
When it comes to “Google Hacking” one name that jumps out in mind is Johnny
Long. Johnny was an early adopter and pioneer in the field of creating such Google
queries which could provide sensitive information related to the target. These queries
are widely popular by the name Google Dorks.
Let’s understand how this technique works. We saw a number of operators which
can narrow down search results to a specific domain, filetype, title value, etc. Now
Copyrighted material
84 CHAPTER 5 Advanced Web Searching
in Google Hacking our motive is to find sensitive information related to the target;
for this people have come up with various different signatures for different files and
pages which are known to contain such information. For example, let’s just say we
know the name of a sensitive directory which should not be directly accessible to
any user publicly, but remains public by default after the installation of the related
application. So now if we want to find out the sites which have not changed the
accessibility for this directory, we can simply use the query “inurl:/sensitive_direc-
tory_name/” and we will get a bunch of websites which haven’t changed the setting.
Now if we want to further narrow it down for a specific website, we can combine
the query with the operator “site,” as “site:targetdomain.com inurl://sensitive_direc-
tory_name/.” Similarly we can find out sensitive files that are existing on a website
by using the operators “site” and “filetype” in collaboration.
Let’s take another example of Google Hacking which can help us to discover high
severity vulnerability in a website. Many developers use flash to make websites more
interactive and visually appealing. Small web format (SWF) is a flash file format used
to create such multimedia. Now there are many SWF players known to be vulnerable to
cross-site scripting (XSS), which could lead to an account compromise. Now if we want
to find out if the target domain is vulnerable to such attack, then we can simply put in
the query “site:targetdomain.com filetype:swf SWFPlayer_signature_keyword” and test
the resulting pages using publicly available payloads to verify. There are huge number
of signatures to find out various types of pages such as sensitive directories, web server
identification, files containing username/password, admin login pages, and much more.
The Google Hacking Database created by Johnny Long can be found at
http://www.hackersforcharity.org/ghdb/ though it is not updated, yet it is a great place
to understand and learn how we can use Google to find out sensitive information. A
regularly updated version can be found at http://www.exploit-db.com/google-dorks/.
□ X
ACKING- ATA ASIE
Welcome to the google hacking database
We call them ’googledorks': Inept or foolish people as revealed by Google. Whatever you call these
fools, you’ve found the center of the Google Hacking Unlversel
Latest Google Hacking Entries
Date Title
W14O6-09 c+tir. Vd»x- fc loon to 81 ...
301+07-29
301407-21 nuril'ehy.hCm’fltiticTcychMeo*.-.
2014-07-04 aetype!s0 ate:gev and Insert Mo*
201407-03 S4c5copc injil:^5te5c£Oo l l col.foo.cw, l SllcSccpc?EU.
201406-12 nMl:‘/mrm/netmork-‘Jhtn/'aRn£t&...
20140603 nOMriT»ub»e!P-|T».-
2014OS-19 nufcdttiedtkjsp
20140106 nde«t;'Hkvnon'njr(:Toj>n.mp...
201405-06 iwl;'AMbfc.f>*vtenKC°ft«<'
Category
ecnUMng tog* po<Uft
Fiat eenlarino jucy rtfo
Various cmr* 0«v>c«l
HescenttMngXtcyinfo
Various OOrc Devices
network or ti*ierab*ty dot.
Wes ecrvwmg jucy rrfc
Various Onftne Oewces
Various Cntne Oewces
FIGURE 5.3
Google hacking database-www.exploit-db.com/google-dorks/.
Copyrighted material
PA85
Bing
BING
Microsoft has been providing search engine solutions from a long time and they
have been known with different names. Bing is latest and most feature-rich search
engine in this series. Unlike its predecessors Bing provides a more clean and simple
interface. As Microsoft covers a major part of operating system market, the general
perspective of a user in terms of search engine is that Bing is just another side-
product from a technology giant and hence most of them do not take it seriously. But
unfortunately it is wrong. Like all the search engines Bing also has some unique fea¬
tures that will force you to use Bing when you need those features. Definitely those
features have a unique mark on how we search. We will discuss not only about the
special features but also the general operators which can allow us to understand the
search engine and its functionalities.
+
This operator works quite similar in all the search engines. This allows a user to
forcefully add single or multiple keywords in a search query. Bing will make sure the
keywords come after + operator must present in the result pages.
Example: power +search
This operator is also known as NOT operator. This is used to exclude something from
a set of things, such as excluding a cuisine.
Example: Italian food -pizza
Here Bing will display all the Italian foods available but not pizza. We can write
this in another form which can also fetch same result such as the below example
Example: Italian food NOT pizza
arr
This is also same in most of the search engines. This is used to search for exact phrase
used inside double quotation.
Example: “How to do Power Searching?”
I
This is also known as OR operator, mostly used for getting result from one of the two
keywords or one of the many keywords added with this operator.
Example: ios I android
ios OR android
85
Copyrighted material
86 CHAPTER 5 Advanced Web Searching
&
This operator is also known as AND operator. This is the by-default used search
operator. If we do nothing and add multiple keywords then Bing will do a AND
search in the backend and give us the result.
Example: power AND search
power & search
As this is the default search, it’s very important to keep in mind that until and
unless we use OR and NOT in capital, Bing won’t understand it as operators.
o
This can be called as group operator.
Grouping of Bing operators supported in following order.
0
NOT/-
And/&
OR/1
As parenthesis has the top priority order, we can add the lower preferred operators
such as OR in that and create a group query to execute the lower priority operators
first.
Example: android phone AND (nexus OR xperia)
site:
This operator will help to search a particular keyword within a specific website. This
operator works quite the same in most of the search engines.
Example: site:owasp.org clickjacking
filetype:
This allows a user to search for data in specific type of file. Bing supports all file
types but few, mostly those are supported by Google are also supported by Bing.
Example: hack filetype:pdf
ip:
This unique operator provided by Bing allows us to search web pages based upon
IP address. Using it we can perform a reverse IP search, which means it allows us to
look for pages hosted on the specified IP).
Example: ip: 176.65.66.66
Copyrighted material
PA87
Bing
QipdKASMjtt&Ag
tJ Kttpsy/wwwJbinacom/s
reb&pq=ip9 63A176.65. ft
WtB IMAGES VIDEOS MAPS NEWS MORE
^ ^ 4 .
&gn m i 0
l> bing | ip:176.65.66.6<
P
10 RESULTS Narrow by language » Narrow by region -
Juactiojiboxes in MS ft SS - Boxes in fRP fiLPie Cast AL also, M
WWW harul-ndia com
Can custorrwe IPS5 66.66. 67 rated
eHewer oomr>a= 15 l 713 *
herts.ac.uk
herts. ac.uk/no_results_fouRd
heits ac uk
herts.ac.uk
baits ac uk/b*yfordbuiy/40i
belts .ac.uk
beds.ac.uk
beds ac.tW_da!a/assets/powerpoirYi doc/D004/2J8126/Nexi-ste|>s tor *
beds act*
beds.ac.uk
beds ac,uVj'_data/ass«ts/posverpo«Tt_doc/0O05/238127/Suppcrtirg. -
beds.ac.uk
ife'gfRetumee^SlgjslS^t^t • ... Translate this pay*
www mitbbs convbbsdoc1/Retumee_9587_0 html »
FIGURE 5.4
Bing "ip” search.
feed:
Yet another unique operator provided by Bing is feed, which allows us to look for
web feed pages containing the provided keyword.
One other feature that Bing provides is to perform social search using the page
https://www.bing.com/explore/social. It allows us to connect our social network
accounts with Bing and perform search within them.
| £t https://www.bing.com, •_■> olorc/vuciol
=
VKfl IMAGES VIDEOS IIEV/S MAPS EXPLORE MORE
bing
Connecting Facebook
to Bing
Tlw Bing social sidebar Ms you tap into toe
wisdom or tie rids and everts across mapr
social networks Ike Facebook. Twitter,
foursquare. Ouora. Ktout, Google* and
What is the Bing sidebar?
The Bing sidebar lets you tap into the wisdom of trends and
experts across major social networks like Facebook. Twitter,
foursquare Ouora, (Ooul, Google* and more Just connect to
Facebook via Bing ana start seeing relevant posts, photos,
tweets, tips and more, aa at a glance
Sign in - 0
FIGURE 5.5
Bing social search.
87
Copyrighted material
88 CHAPTER 5 Advanced Web Searching
YAHOO
Yahoo is one of the oldest players in the search engine arena and has been quite popu¬
lar. The search page for Yahoo also has a lot of content such as news, trending topics,
weather, financial information, and much more. Earlier Yahoo has utilized third party
services to power its search capabilities, later it shifted to become independent and
once again has joined forces with Bing for its searching services. Though there is
not too much that Yahoo offers in terms of advanced searching as compared to other
search engines, the ones provided are worth trying comparing to others. Let’s see
some of the operators that can be useful.
+
This operator is used to make sure the search results contain the keyword followed by it.
Example: +data
Opposite to the “+” operator, this operator is used to exclude any specific keyword
from the search results.
Example: -info
OR
This operator allows us to get results for either of the keywords supplied.
Example: data OR info
site:
This operator allows restricting the result only to the site provided. We will only get
to sec the links from the specified website. There arc two other operators which work
like this operator but do not provide results as accurate or in-depth as they are domain
and hostname. Their usage is similar to the “site” operator.
Example: site:elsevier.com
link:
It is another interesting operator which allows us to lookup web pages which link to
the specific web page provided. While using this operator do keep in mind to provide
the URL with the protocol (http:// or https://).
Copyrighted material
90 CHAPTER 5 Advanced Web Searching
Sltow results with
Sorted By |rX»vwKt *|
« lest 30 dm * 1
Published between | jut » | |t2 »| ■«* |Aug *[ [11*1
TipiYou can »«*rch with.-i • c*rt»m tima pared or apaafr your own data or ranpa of dataa
Source
□
Tipi You can aaarcK for nawa from a fpacific piondti. a ® “Naw Vortc Tunas*.
Categories Search only for pages within;
lh - ‘ 11 ■'••• r .irir'' 1 nai-.v mi vmi. cv
FIGURE 5.7
Yahoo advanced search page.
YANDEX:
Yandex is Russian search engine and is not too much popular outside the country,
but it’s one of the most powerful search engines available. Like Google, Bing, Yahoo
it has its own unique keywords and data indexed. Yandex is the most popular and
widely used search engine in Russia. It’s the fourth largest search engine in the world.
Apart from Russia, it is also used in countries like Ukraine, Kazakhstan, Turkey, and
Belarus, it is also most under rated search engine as its use is only limited to specific
country hut in security community we see it otherwise. Most of the people are either
happy with their conventional search engine or they think all the internet information
is available in the search engine they are using. But the fact is that search engines like
Yandex also have many unique features that can provide us with way efficient result
as compared to other search engines.
Here we will discuss how Yandex can be a game changer in searching data on
internet and how to use it efficiently.
As discussed earlier like other search engines, Yandex has its own operators such
as lang, parenthesis, Boolean, and all. Let’s get familiar with these operators and
their usage.
+
This operator works quite same for all the search engines. Here also for Yandex, +
operator is used to include a keyword in a search result page. The keyword added
after + operator is the primary keyword in the search query. The result fetched by the
search engine must contain that keyword.
Copyrighted material
92 CHAPTER 5 Advanced Web Searching
Example: power /4 searching
Yandex will make sure that the result page must contain these two keywords with
in four words from each other irrespective of keyword position. That means the order
in which we created the query with the keywords might change in result page.
What if we need to fix the order? Yes, Yandex has a solution for that also: adding
a +sign with the number.
Example: power /+4 searching
By adding the + operator before the number will force Yandex to respond with the
results with only pages where these two keywords are in same order and in within 4
word count.
What if we need the reverse of it, let’s say we need to get results of keyword
“searching” first and after that “power” within 4 word count and not vice versa. In
that case negative number will come pretty handy where we can use - sign to reverse
what we just did without getting the vice versa result.
Example: power /-4 searching
This will only display pages which contain searching keyword and power after
that within 4 word count.
Let’s say we want to setup a radius or boundary for a keyword with respect to
another; in that case we have to specify that keyword in second position.
Example: power /(-3 +4) searching
Here we are setting up a radius for searching with respect to power. This means
that the page is displayed in results shown only if either “searching” will be found
within 3 words before or after “power” within 4 word count.
This can be helpful when we are searching for two people’s names. In that case
we cannot guess that which name will come first and which name will come next
so it’s better to create a radius for those two names, and the query will serve our
purpose.
As we discussed a lot about word-based keyword search, now let’s put some light
on sentence-based keyword search. For sentence based keyword search we can use
Yandex && operator with this number operator.
Example: power && /4 searching
In this case we can get result pages containing these two keywords with in 4
sentence difference irrespective of the position of the keyword. That means either
“power” may come first and “searching” after that or vice versa.
/
This operator does something special. And this is one of my favorite keyword. It
gives a user freedom to only search a specific keyword without similar word search
or extended search and all. What exactly happens in general search is that if you
Copyrighted material
Yandex 93
search for a keyword, let’s say AND, you will get some results showing only AND
and then the results will extend to ANDroid or AMD and so on. If we want to get only
result for AND keyword; use this operator.
Example: land
This will restrict the search engine to provide results only showing pages which
contains this particular keyword AND.
//
It can be used to search the dictionary form of the keyword.
Example: Hand
o
When we want to create a complex query with different keywords and operators we
can use these brackets to group them. As we already used these brackets above, now
we will see some other example to understand the true power of this.
o
Web
U
CA
Tracts Isle
Power Searching with Google
powerseorchingvuthgoogle com *
Urtks to the page contan Power searching lessons with Google
Power Search • Executive Management Portal
power-search com *
Power Search is your springboard, the place to find the resources you need to imprme
your management style, mothate ycor employees
Power Search E
(ueleconomy go< * (empower Search j$p»
You are here Find a Car Home > Power Search Power Search Expand any feature
by selecting its tale bar.
Math Forum - Math Library Power Search
mathforum org > library-search html *
Search the Library Power Search
Power Search by Inspyder • Search and Scrape Any Website w
inspyder com > prcducts/PowerSearchrOefaull aspx *
Power Search enables you to quickly search a website fer content that is not normaly
indexed by search engines
Power Search Tool - Search The Web
FIGURE 5.8
Yandex complex query.
Example: power && (+searching I !search)
Here the query will search for both sets of keywords first power searching and
power search but not both in same result.
n?r
Now it’s about a keyword let’s say we want to search a particular string or set of
keywords then what to do? Here this operator comes for rescue. It is quite similar
Copyrighted material
94 CHAPTER 5 Advanced Web Searching
as Google’s “”. This will allow a user to search for exact keywords or string which is
put inside the double quotes.
Example: “What is OSINT?”
It will search for exact string and if available will give us the result accordingly.
*
This operator can be refereed as wildcard operator. The use of this operator is quite
same in most of the search engines. This operator is used to fill the missing keyword
or suggest relevant keywords according to the other keywords used in the search
query.
Example: osint is * of technology
It will search for auto fill the space where * is used to complete the query with
relevant keywords. In this case that can be ocean or treasure or anything. We can also
use this operator with double quote to get more efficient and accurate result.
Example: “OSINT is * of technology”
I
This is also quite similar to OR operator of Google. It allows us to go for different
keywords where we want results for any of them. In-real time scenario we can search
for options using this operator. Let’s say I want to buy a laptop and I have different
options: in that case this operator will come to picture.
Example: dell I toshiba I macbook
Here we can get result for any of these three options but not all in one result.
«
This is an unusual operator known as non-ranking “AND.” It is basically used to
add additional keywords to the list of keywords without impacting the ranking of
the website on result. We might not get to know what exactly it does by just going
through its definitions. So in simple words it can be used to tag additional keywords
to the query list without impacting the page rankings.
Example: power searching « OSINT
It can be used to additionally search for OSINT along with the other two key¬
words without impacting the page ranking in the result page.
title:
This is quite equivalent to the “intitle.” It can be used to search the pages with the
keyword (s) specified after title query parameter.
Example: title:osint
Copyrighted material
Yandex 95
This will provide pages that contain OSINT in the title of the web page. Similarly
we can use this title query parameter to search for more than one keyword.
Example: title:(power searching)
url:
This “url” search query parameter is also an add-on. It searches for the exact URL
provided by the user in Yandex database.
Example: url:http://attacker.in
Here Yandex will provide a result if and only if the URL has been crawled and
indexed in its database.
inurl:
It can be used to search for keywords present in a URL or in other words for URL
fragment search. This “inurl” query parameter works quite similar in all the search
engines.
Example: inurkosint
it will search for all the URLs that contain osint keyword no matter what the posi¬
tion of the keyword is.
mimeifiletype
This query parameter is quite similar to “filetype” query parameter of Google. This
helps a user to search for a particular file type.
Example: osint mime:pdf
FIGURE 5.9
Yandex file search.
Copyrighted material
96 CHAPTER 5 Advanced Web Searching
It will provide us all the PDF links that contains osint keyword. The file types
supported by Yandex mime are
PDF, RTF, SWF, DOC, XLS, PPT, DOCX, PPTX, XLSX, ODT, ODS, ODP,
ODG
host:
It can be used to search all the available hosts. This can be used by the penetration
testers mostly.
Example: hostiowasp.org
rhost:
It is quite similar to host but “rhost” searches for reverse hosts. This can also be used
by the penetration testers to get all the reverse host details.
It can be used in two ways. One is for subdomains by using the wildcard operator
* at the end or another without that.
Example: rhost:org.owasp.*
rhost:org.owasp.www
site:
This operator is like the best friend of a penetration tester or hacker. This is avail¬
able in most of the search engines. It provides all the details of subdomains of the
provided URL.
For penetration testers or hackers finding the right place to search for vulner¬
ability is most important. As in most cases the main sites are much secured as
compared to the subdomains, if any operator helps to simplify the process by
providing details of the subdomains to any hacker or penetration tester then half
work is done. So the importance of this operator is definitely felt in security
industry.
Example: site: http://www.owasp.org
It will provide all the available subdomains of the domain owasp.com as well as
all the pages.
date:
This query can be used to cither limit the search data to a specific date or to specific
period by a little enhancement in the query.
Example: datc:201408*
In this case, format of date used is YYYYMMDD, but in case of the DD we used
wildcard operator so we will get results limited to August 2014.
Copyrighted material
Yandex 97
We can also limit the same to a particular date of the August 2014 by changing a
bit in the query.
date:20140808
It will only show results belong to that date.
We can also use “=” in place of and it will still work the same. So the above
query can be changed to
date=201408*
date=20140808
As we discussed earlier we can also limit the search results to a particular time
period. Let’s say we want to search something from a particular date to till date. In
that case we can use
date=>20140808
It will provide results from 8th August 2014 to till date, but what if we want to
limit both the start date and the end date. In that case also Yandex provide us a provi¬
sion of providing range.
date=20140808..20140810
Here we will get the results form date 8th August 2014 to 10th August 2014.
domain:
It can be used to specify the search results based of top level domains (TLDs). Mostly
this type of the domain search was done to get results from country-specific domains.
Let’s say we wanted to get the list of CERT-empanelled security service providing
company names from different countries. In that case we can search for the country-
specific domain extension let’s say we want to get these details for New Zealand then
its TLD is nz. So we can craft a query like
Example: “cert empanelled company” domain:nz
lang:
It can be used to search pages written in specific languages.
Yandex supports some specific languages such as
RU: Russian
UK: Ukrainian
BE: Belorussian
EN: English
FR: French
DE: German
KK: Kazakh
TT: Tatar
TR: Turkish
Copyrighted material
98 CHAPTER 5 Advanced Web Searching
Though we can always use Google translator to translate the page from any
languages to English or any other languages, it’s an added feature provided by
Yandex to fulfill minimum requirements of the regions where Yandex is used
popularly.
So to search a page we need to provide the short form of the languages.
Example: power searching lang:en
It will search for the pages in English that contains power searching.
cat:
It is also something unique provided by Yandex. Cat stands for category. Yandex cat¬
egorizes different things based on region id or topic id. Using cat we can search for a
result based on region or topic assigned in Yandex database.
The details of Regional codes: http://search.yaca.yandex.ru/geo.c2n.
The details of Topic codes: http://search.yaca.yandex.ru/cat.c2n.
Though the pages contains data in Russian language, we can always use Google
translate to serve this purpose.
As we discussed in the beginning that Yandex is an underrated search engine
some of its cool features are definitely going to put a mark on our life once we go
through this chapter. One of such feature is its advanced search GUI.
There are lazy people like me who want everything in GUI so that they just have
to customize everything by providing limited details and selecting some checkbox or
radio buttons. Yandex provides that in the below link
http://www.yandex.com/search/advanced?&lr= 10558
Here we have to just select what we want and most importantly it covers most
of the operators we discussed above. So go to the page, select what you want, and
search efficiently using GUI.
Definitely after going through all these operators we can easily feel the impact
of the advance search or we can also use the term power search for that. The
advance search facilitates a user with faster, efficient, and reliable data in the result.
It always reduces our manual efforts to get the desired data. And the content qual¬
ity is also better in advance search as we limit the search to what we are actually
looking for. It can be either country-specific domain search, a particular file type,
or content from a specific date. These things cannot be done easily with simple
keyword search.
We are in an age where information is everything. Then the reliability factor
comes in to picture and if we want bulk of reliable information from the net in very
less time span then we need to focus on the advance search. We can use any con¬
ventional search engine of our choice. Most of the search engines have quite similar
operators to serve the purpose but there are some special features present; so look
for those special features and use different search engines for different customized
advance search.
Copyrighted material
Yandex 99
So we learned about various search engines and their operators and how to utilize
these operators to search better and get precise results. For some operators we say
their individual operations and how they can help to narrow down the results and for
some we saw how they can be used with other operators to generate a great query
which directly gets us to what we want. Though there are some operators for differ¬
ent search engines which work more or less in the same fashion yet as the crawling
and indexing techniques of different platforms are different, it is worthwhile to check
which one of them provides better results depending upon our requirements. One
thing that we need to keep in mind is that the search providers keep on deprecating
the operators or features which are not used frequently enough and also some func¬
tionalities are not available in some regions.
We saw how easily we can get the results that we actually want with the use
of some small but effective techniques. The impact of these techniques is not just
limited to finding out the links to websites, but if used creatively they can be imple¬
mented in various fields. Apart from finding the information on the web, which cer¬
tainly is useful for everyone, these techniques can be used to find out details which
are profession specific. For example a marketing professional can scale the size of
the website of competitor using the operator “site,” or a sales professional can find
out emails for a company using the wildcard operator “*@randomcompany.com.”
We also saw how search engine dorks are used by cyber security professionals to find
out sensitive and compromising information just by using some simple keywords and
operators. The takeaway here is not just to learn about the operators but also about
how we can use them creatively in our profession.
We have covered a lot about how to perform searching using different searching
platforms in this and some previous chapters. Till now we have mainly focused on
browser-based applications or we can say web applications. In the next chapter we
will be moving on and learn about various tools which need to be installed as appli¬
cations and provide us various features for extracting data related to various fields,
using various methods.
Copyrighted material
PA 100
This page intentionally left blank
Copyrighted material
PA101
CHAPTER
OSINT Tools and
Techniques
INFORMATION IN THIS CHAPTER
• OSINT Tools
• Geolocation
• Information Harvesting
• Shodan
• Search Diggity
• Recon-ng
• Yahoo Pipes
• Maltego
INTRODUCTION
In the previous chapters we learned about the basics of the internet and effective ways
to search it. We went to great depths of searching social media to unconventional
search engines and further learned about effective techniques to use regular search
engines. In this chapter we will move a step further and will discuss about some of
the automated tools and web-based services which are used frequently to perform
reconnaissance by professionals of various intelligence-related domains specially
information security. We will start from the installation part to understanding their
interface and will further learn about their functionality and usage. Some of these
tools provide a rich graphic interface (GUI) and some of them are command line
based (CLI), but don’t judge them by their interface but by their functionality and
relevance in our field of work.
Before moving any further we must install the dependencies for these tools so that
we don’t have to face any issues during their installation and usage. The packages
we need are
• Java latest version
• Python 2.7
• Microsoft .NET Framework v4
We simply need to download the relevant package depending upon our system
configuration and we are good to go.
Hacking Web Intelligence. httpV/dx.doi.org/10.1016/B978-0-12-801867-5.00006-9 101
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
102 CHAPTER 6 OSINT Tools and Techniques
CREEPY
Most of us are addicted to social networks, and image sharing is one of the most utilized
features of these platforms. But sometimes when we share these pictures it’s not just the
image that we are sharing but might also the exact location where that picture was taken.
Creepy is a Python application which can extract out this information and display
the geolocation on a map. Currently Creepy supports search for Twitter, Flickr, and
Instagram. It extracts the geolocation based on EXIF information stored in images,
geolocation information available through application programming interface (API),
and some other techniques.
It can be downloaded from http://ilcktrojohn.github.io/crccpy/. Wc simply need
to select the version according to our platform and install it. The next phase after
installation of Creepy is to configure the plugins that are available in it, for which
we simply need to click on the Plug-in Configuration button present under the edit
tab. Here we can select the plugins and using their individual configuration wizard
configure them accordingly. Once the configuration is done we can check whether it
is working properly or not using the Test Plugin Configuration button.
FIGURE 6.1
Configure Creepy.
After the configuration phase is done, we can start a new project by clicking on
the person icon on the top bar. Here we can name the project and search for people
on different portals. From the search results we can select the person of interest and
include him/her in the target list and finish the wizard. After this our project will be
displayed under the project bar at the right-hand side.
Copyrighted material
PA 103
Creepy
S] crec.py Gcolocation OS1NT to<
Creepy Edit View Fitters
E ® -i, Jzt Ui
fftrgtt Projects
Piojecti
f Locations
<?> New Person Project
^rir^
Search by username, mal, M name, id
(2) 1 *] Flkkr Plugin
l2j 8) Instagram Plugin
(V) □ Twitter Plugin
Search Results
Plugin
Picture
Username
Full Name
Userid
*
Twitter Plugin
K
testuserNol
testuser numbe...
49628700
□
Twitter Plugin
a
nakachiS6
testuser
177183359
Twitter Plugin
n
jjUCC
JJ TcstUscr
348697535
Twrtter Pluain
h_
t«tuserlll2
testnqpuser22_
158275283
-
|»dd To Targets]
Selected Targets
Plugin
Picture
Username
Full Name
Userid
< Bad: | [ Next > | Cancel |
FIGURE 6.2
Search users.
Now we simply need to select our project and click on the target icon or right
click on the project and click Analyze Current Project. After this Creepy will start
the analysis, which will take some time. Once the analysis is complete, Creepy will
display the results on the map.
FIGURE 6.3
Creepy results.
Now we can see the results in which the map is populated with the markers
according the identified geolocation. Now Creepy further allows us to narrow down
these results based on various filters.
103
Copyrighted material
104 CHAPTER 6 OSINT Tools and Techniques
Clicking on the calendar button allows us to filter the results based on a time
period. We can also filter the results based upon area, which we can define in the form
of radius in kilometers from a point of our choice. We can also see the results in the
form of a heat map instead of the markers. The negative sign (-) present at the end
can be used to remove all the filters imposed on the results.
FIGURE 6.4
Applying filter.
The results that we get from Creepy can also be downloaded in the form of CSV
file and also as KML, which can be used to display the markers in another map.
Creepy can be used for the information-gathering phase during a pentest
(penetration test) and also as a proof-of-concept tool to demonstrate to users what
information they are revealing about themselves.
FIGURE 6.5
Download Creepy results.
Copyrighted material
PA 105
TheHarvester
THEHARVESTER
TheHarvester is an open source intelligence tool (OSINT) for obtaining e-mail
addresses, employee name, open ports, subdomains, hosts banners, etc. from public
sources such as search engines like Google, Bing and other sites such as Linkedln. It’s
a simple Python tool which is easy to use and contains different information-gathering
functions. Being a Python tool it’s quite understandable that to use this tool we must
have Python installed in our system. This tool is created by Christian Martorella and
one of the simple, popular, and widely used tools in terms of information gathering.
TheHarvester can be found here: http://www.edge-security.com/theharvester.php
Generally we need to input a domain name or company name to collect relevant
information such as email addresses, subdomains, or the other details mentioned in
the above paragraph. But we can use keywords also to collect related information.
We can specify our search, such as from which particular public source we want to
use for the information gathering. There are lots of public source that Harvester use for
information gathering but before moving to that let’s understand how to use Harvester.
EX: theharvester -d cxample.com -1 500 -b Google
-d = Generally, domain name or company name
-1 = Number of result limits to work with
-b = Specifying the data source such as in the above command its Google, but
apart from that we can use Linkedln and all (to use all the available public
sources) as a source also to collect information.
rootigjKau: -
File Edit View Search Terminal Help
ill:-# theharvester -d » ■ **.com -1 560 -b google
*1111 A A
:
*1 1 ' \ / \ / / / / ' \
•\\// V | / \ ' | *
* i i iii i i / / /in
| \ V / A \ 11 / 1
* \ _LI IJ\ _1 V /_/ \_,J
_l w \_1 1 _A_\_LI *
* TheHarvester Ver. 2.2a
*
* Coded by Christian Martorella
*
* Edge-Security Research
*
* cmartorella@edge-security.com
*
4:4::* it::*:**:**** l** *****************
[-] Searching in Google:
Searching 0 results...
Searching 100 results...
Searching 200 results...
Searching 300 results...
Searching 400 results...
Searching 500 results...
[+] Emails found:
mm Qumx
st - -s ■= ■»“ .ne.com
del" . a « -
CO*
au« -« n . « _■
tors iW B-rrli in - - n
shf ine.com
coi . " - 8 — om
FIGURE 6.6
105
TheHarvester in action.
Copyrighted material
106 CHAPTER 6 OSINT Tools and Techniques
Apart from the above mentioned one harvester also has other options to specify,
such as:
-s = to start with a particular result number (the default value is 0)
-v = to get virtual hosts by verifying hostnames via DNS resolution
-f=for saving the data, (formats available either html or xml)
-n = to perform DNS resolve query for all the discovered ranges
-c = to perform DNS brutcforce for all domain names
-t = to perform a DNS TLD expansion discovery
-e = to use a specific DNS server
-l = To limit the number of result to work with
-h = to use Shodan database to query discovered hosts.
♦ [» file:///root/Desktop/aa.html
Hi) Most Visited'' |jl Offensive Security \ Kali Linux \ Kali Docs H Ex P l °it-DB ^Aircrack-ng
theHarvester results
for ijppiii-iiiMe.com
Dashboard:
68 %
E-mails names found:
• stu* ■ i w fcne.com
• dei* ■ ■ ■ ■
• coil. UlnJK ■
• aui _■ ■ - m ^ mm
• ton- ■ ■ ■ ■ ■
• sh<ww .iRPUBWiyy jne.com
FIGURE 6.7
TheHarvester HTML results.
The sources it uses are Google, Google profiles, Bing, pretty good privacy
(PGP) servers, Linkedln, Jigsaw, Shodan, Yandex, name servers, people 123, and
Copyrighted material
PA 107
Shodan
Exalead. Google, Yandex, Bing, and Exalead are search engines that are used in
backend as a source, while Shodan is also a search engine but not the conven¬
tional one and we already discussed a bit about it earlier and we will discuss in
detail about the same in this chapter later. PGP servers are like key servers used
for data security and those are also a good source to collect e-mail details. The
people 123 is for searching for a particular person and Jigsaw is the cloud-based
solution for lead generation and other sales stuffs. From different sources har¬
vester collects different information such as for e-mail harvesting it uses Google,
Bing, PGP servers, and sometimes Exalead and run their specific queries in the
background to get the desired result. Similarly for subdomains or host names it
uses again Google, Bing, Yandex, Exalead, PGP servers, and Exalead. And finally
for the list for employee names it uses Linkedln, Google profiles, people 123, and
Jigsaw as a main source.
This is how theHarvester harvests all the information and gives us the desired
result as per our query. So craft your query wisely to harvest all the required
information.
SHODAN
We have previously discussed about Shodan briefly in Chapter 4, but this unique
search engine deserves much more than a paragraph to discuss its usage and impact.
As discussed earlier Shodan is a computer search engine. The internet consists of
various different types of devices connected online and available publicly. Most of
these devices have a banner, which they send as a response to the application request
send by a client. Many if not most of these banners contains information which
can be called sensitive in nature, such as server version, device type, authentication
mode, etc. Shodan allows us to search such devices over internet and also provides
filters to narrow down the results.
It is highly recommended to create an account to utilize this great
tool, as it removes some of the restrictions imposed on the free usage. So
after logging into the application we will simply go to the dashboard at
http://www.shodanhq.com/home. Here we can see some the recent searches as
well as popular searches made on this platform. This page also shows a quick ref¬
erence to the filters that we can use. Moving on let’s see more popular searches
listed under the URL http://www.shodanhq.com/browse. Here we can see there
are various different search queries which look quite interesting, such as web¬
cam, default password, SCADA, etc. Clicking on one of these directly takes us
to the result page and lists details of machines on the internet with that specific
keyword. The page http://www.shodanhq.com/help/filters shows the list of all
the filters that we can use in Shodan to perform a more focused search, such as
country, hostname, port, etc., including the usual filters and “I.”
107
Copyrighted material
PA 108
108 CHAPTER 6 OSINT Tools and Techniques
f <?> SHOOAN Compulrr
1
FIGURE 6.8
Shodan popular searches.
FcaSHOOAN -ConymwSx x T M
«- -r C D wvm.shodanhq.co
=
User Guide
Fdter Reference
Filters
in additon to boolean operators, mere are special titers to narrow down re itocn resues.
General
Ml fillers tvave the formal Tiller value’ and a
t be added angrwtrere in tie searcti query. Noace that there's no space before or alter
» city
Use the at)' Alter to find deuces totaled «tie given ot. its best combined mb the 'country tiler to nuke sure you pel the city in tie
country you ward (eft/ names are notaftways urtquo)
» country
The 'country liter is used lo narrow results down by., country. Its useftil for when you ward to tod compuleis running n a speedte
country,
Examples:
- _>.• seiw , ... .i i i .rj... i. ... •
Ngta* servers tocsled in Germany noinn counlnr nr
n goo
The gotf frier allows rou to And devicri tbal are wdbin a certain radius ct Ihe gtan UMude and longitude. The Alter accepts enter 2
or 3 arguments. The optonal nrd argument Is Tie radus in Mlometers wibln to search tor computers idstauit S)
M^-Jie servers neai42.gW3.-74.l224 wikllllantt.42Jfigl 711221
Deuces wttan a 50km radius cd San Diego (32 8.-117)r 080:32.8.117.50
» hostname
The hostvamo' Hler lets you search for hosts tiat contain fte value in tm hostname
Examples:
FIGURE 6.9
Shodan filters.
Let’s perform a simple search on Shodan for the keyword “webcam.” Shodan has
simply found more than 15,000 results for this keyword; though we cannot view all the
results under the free package, yet what we get is enough to understand its reach and
availability of such devices on the internet. Some of these might be protected by some
kind of authentication mechanism such as username and password, but some might be
publicly accessible without any such mechanism. We can simply find out by opening
Copyrighted material
PA 109
Shodan
their listed IP address in our browsers (Warning: It might be illegal to do so depending
upon the laws of the country, etc.). We can further narrow down these results to a
country by using the “country” filter. So our new query is “webcams country:us” which
gives us a list of webcams in the United States of America.
Services
HTTP
HTTPAMeinate
3S.72.104.72
cuiic
mmoc* at 02221 c
■ SWmVHli
401 UlMItorUcd
one n .t»c»tioc«ti 2A2IC ctelnr-W (Oliver. Active maCW
C»1|V1-I»0»! 1 «*t/Mr>l
Top Countries
UMOd Staton
Geimany
Korea. Republic at
2.208
2.D0S
1,382
— VIDEO WEB SERVER
Tu'k T.WI.*.,
Add an 19-01291-4
B l.wi
IVtliKHtiV
Miwt.e too ore
AdCteot 10 0* 304-4
Cc.i.it-Tjp .1 mt/hmli cMrtctnitr -0
CcniMi-Lcnttht 2lM
octet nwi, is 0.0 sort »tsr>oo «nt
FIGURE 6.10
Shodan results for query “webcam”
To get a list of machines with file transfer protocol (FTP) service, residing in India,
we can use the query “port:21 country:in” We can also perform search for specific IP
address or range of it using the filter “net.” Shodan is providing a great deal of relevant
information and its application is only limited by the creativity of its users.
f eg, shodan Computer
- C D www.shodanhqxom/search ?q «port%3A2l *country%3Ain
Shodan Explorts Scanhuto Maps Blog Mem berth c
Settingsi Logout 2— r3 tf)
+ Add to Directory ill export Data
Top Citio*
MtwDMM
122,179,140,225
115.118xlfi5.iaa
TATA Ccmnuilco*o.c
C4MCV1 19 Ot 3014
116 112.1M 1»M
i,<q ,u,k«i 0 72117 MM «
Rt sots 1 -10 atiCM 1087545for port2l eogntiy:*
421 Lee in UowreM.
421 Let if, leoccrKt.
220 bliod 2.2 .1 IIS-110. US-101 reccV.
SW Let It, Uocerttc.
421 Letin i.Meretl.
220 FllcTillc Server umlcn 0.1.22 beta urlticr, by Tin Kocce ITl4i.Ketc40vM.de> Olccce vtctc Itt'ecs/ceureetcroe.
“VT1 yiT rra r»T re>: see pts re.*
FIGURE 6.11
Shodan results for query "port:21 country:in."
109
Copyrighted material
110 CHAPTER 6 OSINT Tools and Techniques
Apart from this Shodan also offers an API to integrate its data into our own
application. There are also some other services provided by it at a price and are
worth a try for anyone working in the information security domain. Recently there
has been a lot of development in Shodan and its associated services which makes
this product a must try for information security enthusiasts.
SEARCH DIGGITY
In the last chapter we learned a lot about using advanced search features of various
search engines and also briefly discussed about the term “Google Hacking.” To per¬
form such functions we need to have the list of operations that we can use and will
have to type each query to see if anything is vulnerable, but what if there was a tool
which has a database of such queries and we can simply run it. Here enters the Search
Diggity. Search Diggity is tool by Bishop Fox which has a huge set of options and a
large database of queries for various search engines which allow us to gather com¬
promising information related to our target. It can be downloaded from http://ww-
w.bishopfox.com/resources/tools/google-hacking-diggity/attack-tools/. The basic
requirement for its installation is Microsoft .NET framework v4
Once we have downloaded and installed the application, the things we need are
the search ids and API keys. These search ids/API keys arc required so that we can
perform more number of searcher without too many restrictions. We can find how
to get and use these keys in the contents section under the Help tab and also from a
some simple Google searches. Once all the keys (Google, Bing, Shodan, etc.) are at
their place we can move forward with the usage of the tool.
V Search Diggity | «=> II 13 II S3 I
File Options Help
CodeSearch Bing j| LinkFromDomain || DIP [ Flash | Malware PortScan NoUnMyBackyard J BingMalware Shodan ,
Google Status: Ready Download Progress: Idle Open Folder
FIGURE 6.12
Search Diggity interface.
Copyrighted material
PA111
Search Diggity 111
There are many tabs in the tool such as Google, Bing, DLP, Flash, Shodan etc.
Each of these tabs provides specialized functions to perform targeted search to
identify information which can be critical from an information security point of
view.
To use the tool we simply need to select one of the tabs at the top and further
select the type of queries that we want to use. We can also specify the domain that
we want to target and simply perform the scan. Depending upon what is available
online the tool will provide us the results for various different queries related to the
query type we have selected. It is highly recommended to select only the query types
that we are really interested into, as it will help us to narrow down the total number
of queries. The queries present are categorized properly to identify and make choice
accordingly.
Let’s use the queries to identify SharePoint Administrative pages. For this
we simply need to select the Google tab and from the left-hand menu, check the
Administrative checkbox under SharePoint Diggity, and run the scan.
FIGURE 6.13
Search Diggity scan—Google tab.
To make this scan more targeted we can specify a list of targets under the option
Sites/Domains/IP Ranges. As soon as we start the scan we can see the results coming
up with various information like category, page title, URL, etc. Similarly we can also
use the Bing scan which has its own set of search queries.
Copyrighted material
PA113
Recon-ng
T r Search Diggity cd || B ||£3^
File Options Help
Shodan Status: Scanning...
FIGURE 6.16
Search Diggity—Shodan scan.
Recon-ng
There are many tools for reconnaissance but a special mention should be given to
Recon-ng. This is an open source tool written in Python majorly by Tim Tomes
(@Lanmaster53). There are many other researchers, coders, and developers who have
contributed to this project. This project is one of its kind in terms of complete OSINT
framework. The authors might have different opinion on my previous statement but
still this framework helps all the OSINT enthusiast to perform various stages of
reconnaissance in automated way.
It mainly focus on web-based open-source reconnaissance and provides its users
with unique independent modules, elaborated and much required command based
help, database interaction and command completion facility to perform reconnais¬
sance deeply and fast paced. Apart from that it’s made in a fashion that if a newbie
into the field of security wants to contribute to it, he/she can easily do it with a little
Python knowledge. It’s just possible only because of well-structured modules, fully
fledged documentation, and the uses of only-native Python functions that a new user
or contributor will not face problem to download and install third party modules of
Python for a specific task.
The tool can be downloaded from: https://bitbucket.org/LaNMaSteR53/recon-ng
The user guide: https://bitbucket.Org/LaNMaSteR53/recon-ng/w iki/Usage_Guide
The development guide: https://bitbucket.org/!,aNMaSteR53/recon-ng/wiki/Development Guide
113
Copyrighted material
114 CHAPTER 6 OSINT Tools and Techniques
Apart from the perspective of developer or contributor the author also focused on
the ease of use for the users. The framework looks quite same as Metasploit which is a
quite popular tool for exploitation in information security community. If you arc from
information security community or you have prior experience of using Metasploit, it’s
quite the same to use Recon-ng.
Recon-ng is quite easy to install. And to run the same we just need Python
2.7.x installed in our system. Just call recon-ng.py file from a terminal and you
will get a fancy banner of the toll with credits and all along with that a recon-ng
prompt.
To check all the available commands we can use command help. It will show all
the available commands
> help
add
Adds records to the database
back
Exits the current context
del
Deletes records from the database
exit
Exits the framework
help
Displays this menu
keys
Manages framework API keys
load
Loads specified module
pdb
Starts a Python Debugger session
query
Queries the database
record
Records commands to a resource file
reload
Reloads all modules
resource
Executes commands from a resource file
search
Searches available modules
set
Sets module options
shell
Executes shell commands
show
Shows various framework items
spool
Spools output to a file
unset
Unsets module options
use
Loads specified module
workspaces
Manages workspaces
Here in this framework some fine features are provided such as workspaces. It
consists of different settings, database, etc., and a self-independent place for a single
project.
To know more about workspaces, we can use the command
> help workspaces
Copyrighted material
Recon-ng 115
This command is used to manage workspaces such as providing user freedom to
list down, add, select, delete workspaces. If a user does not set a workspace externally,
then he/shc will be under default workspace. If we want to check in which workspace
we are exactly then the command is
> show workspaces
+ - +
I Workspaces I
+.+
I default I
+.+
And we will get something similar to this showing that we are under default
workspace.
Let’s say we want to change the workspace to something that we want, let’s say
osint, then the command would be
> workspaces add osint
The prompt itself shows the workspace so the default prompt we will get in fresh
installation is
[recon-ng] [default] >
After the above command the prompt will change in to
[recon-ng] [osint] >
Now it’s time to explore the commands and its capabilities. If you are using this
tool for the first time the most needed command after “help” is “show.”
[recon-ng] [osint] > show
Using this command we can see available details of banner, companies, contacts,
credentials, dashboard, domains, hosts, keys, leaks, locations, modules, netblocks,
options, ports, pushpins, schema, vulnerabilities, and workspaces details but here we
want to explore the modules section to see what all are possibilities available.
Basically recon-ng consists of five different sections of modules.
1. Discovery
2. Exploitation
3. Import
4. Recon
5. Reporting
Copyrighted material
116 CHAPTER 6 OSINT Tools and Techniques
Applications Places
- VMwnre Ptaycr £ile ” Xiitual Machine » Help ”
File Edit View Search Terminal Help
(recorvng v4.1.7, Jim Tomes {(JL*JMaSteR53)]
(57] Recon modules
15] Reporting modules
(2] Exploitation modules
(2] Discovery modules
(1] Import modules
(recon-ng)(default) » show modules
Discovery
discovery/info_disclosure/cache_sr*oop
discovery/info_disclosure/interesting_files
Exploitation
expl oi tat ion/inj ec t ion/commar»d_in j ec t o r
exploitation/in]ec tion/xpath_brute r
Import
import/csv_file
Recon
mm aocacoKi
recon/companies-contacts/facebook
recon/companies-contacts/]lgsaw
recon/companies-contacts/]igsaw/point_usage
recon/companies-contacts/]igsaw/purchase_contact
recon/companies-contacts/jigsaw/search_contacts
recon/companies-contacts/linkedin_auth"
recon/companies-contacts/linkedin“c rawl
recon/contacts-contacts/nangle
recon/contacts-contacts/namechk
recon/contacts-contacts/rapportive
□ root@)kali: -
FIGURE 6.17
Recon-ng modules.
And by using following commands we will able to see more details as off the
available options about these five sections
Irecon-ngJ [osint] > show modules
such as under-discovery interesting files, under-exploitation command injection,
under-import CSV files, under-recon company contacts, credentials, host details,
location information and many more and last hut not the least under-reporting CSV,
HTML, XML, etc.
Now we can use these modules based on our requirements. To use any of these
modules, first, we need to load that module using following command but before
that we must know that this framework has a unique capability to load a module
by auto completing it or if more modules are available with a single keyword
then giving all the module list. Let’s say we want to check the pwnedlist and we
are so lazy to type the absolute command. Nothing to worry just do as shown
below
[recon-ng] [osint] > load pwnedlist
Now recon-ng will check whether this string is associated with a single mod¬
ule or multiple modules. If it is associated with a single module then it will
load that or else it will give the user all the available modules that contain this
keyword.
Copyrighted material
118 CHAPTER 6 OSINT Tools and Techniques
Applications Places
& Kafi - VMwsirc Ptnycr Fife ~ yirtiul Machine • fcje*p ▼
Filt Edit View Search Terminal Help
(recon-ng)[osint](pwnedlist) > show info
Name: PwnedList Validator
Path: modules/r«con/contacts-creds/pwnedlist.py
Author: Tim Tomes («LaNNoSteR53)
[Description:
Leverages PMiedList.com to determine if email addresses are associated with leaked credentials. Adds
compromised email addresses to the 'credentials' table.
ptions:
Name Current Value Req Description
SOURCE google6gmail.com yes source of input (see 'show info' for details)
ISource Options:
default SELECT DISTINCT email FROM contacts WERE email IS NOT NULL ORDER BY email
<string> string representing a single input
<path> path to a file containing a list of inputs
query <sql> database query returning one column of inputs
(recon-ng)[osint)[pwnedlist]
(recon-ngjjosintj(pwnedlistj
(recon-ngj[osintj[pwnedlistj
(recon-ngj[osintj[pwnedlistj
(recon-ngj[osintj[pwnedlistj
(recon-ngj[osintj(pwnedlistj
(recon-ngj[osintj(pwnedlistj
(recon-ngj[osintj[pwnedlistj
(recon-ng][osint][pwnedlist]
(recon-ngj[osintj[pwnedlistj
[recon-ngj[osintj[pwnedlistj
[recon-ngj[osintj[pwnedlistj
(recon-ngj(osintj(pwnedlistj
(recon-ngj(osintj[pwnedlistj
(recon-ngj[osintj[pwnedlistj
(recon-ngj[osintj[pwnedlistj
□ root<g)kali: -
mm omes
FIGURE 6.19
Recon-ng module detailed information.
This command provides detailed information, name of the module, path, author
name, and description in details.
As we can see from the above figure that we need to add a SOURCE as an input
to run this module, and what kind of input needed is also mentioned in the bottom
part of the same figure. We can craft a command such as
[rccon-ng] [osint] [pwnedlist] > set SOURCE google@gmail.com
This command is taken as we provided a proper and valued input. Now the com¬
mand to run this module to check whether the above e-mail id is pwned somewhere
or not is
[recon-ng] [osint] [pwnedlist] > run
File Edit View Search Terminal Help
[ recon-ng] [osint] [pwnedlist] > set SOURCE google@gmail.com
SOURCE => google@gmail.com
[rec on -ng][osint][pwnedlist] > run
[*] google@gmail.com => Pwned! Seen at least 27 times, as recent as 2014-08-28.
SUMMARY
[*3 1 total (0 new) items found.
FIGURE 6.20
Recon-ng results.
Copyrighted material
Recon-ng 119
Voila! The above e-mail id has been pwned somewhere. If we want to use some
other modules we can simply use the “load” command along with the module name
to load and use the same.
This is how we can easily use this recon-ng. The commands and approaches will
remain quite same. First look for the modules. Choose the required module, load it,
check for its options, provide values to the required fields, and then run. If required
repeat the same process to extend the reconnaissance.
Now let’s discuss some of the scenarios and the modules that can be handy for
the same.
CASE 1
If we are into sales and desperately wanted to collect database to gather pro¬
spective clients then there are certain modules available here that will be pretty
helpful. If we want to gather these information from social networking sites,
Linkedln is the only place where we can get exact names and other details as
compared to other sites which generally consists of fancy aliases. And if we are
in to core sales then we might have heard of portals like Sales Force or Jigsaw,
where we can get certain details either by free or by paying reasonable amount
of money. And mostly nowadays in IT sector sales teams focus less on cold call¬
ing and more on spreading details on e-mail. So getting valid e-mails from a
target organization is always like half work done for sales team. So here we will
discuss the sources available to get these information and its associated modules
in recon-ng.
Available modules:
recon/companies-contacts/facebook
rccon/companies-contacts/jigsaw
recon/companies-contacts/linkedin_auth
These are some of the modules but not all that can be helpful to gather informa¬
tion such as name, position, address, etc.
But e-mail addresses are the key to contact. So let’s look into some options to
collect e-mail addresses. We can collect some e-mail id details from Whois database.
Search engines also sometimes play a vital role in collecting e-mail address, using
PGP servers.
Available modules:
recon/domains-contacts/pgp_search
recon/domains-contacts/whois_pocs
CASE 2
Physical tracking. The use of smart phones intentionally or unintentionally allowed
users to add their geolocation with data that they upload to different public sites
Copyrighted material
120 CHAPTER 6 OSINT Tools and Techniques
such as YouTube, Picasa, etc. In that case we can collect information by the help of
geotagged media. This can be used for behavioral analysis, understanding a person’s
likes and dislikes, etc.
Available modules:
recon/locations-pushpins/flickr
rccon/locations-pushpins/picasa
recon/locations-pushpins/shodan
recon/locations-pushpins/twitter
recon/locations-pushpins/youtube
CASE 3
If some organization or person wants to check whether he/she or any of a company’s
e-mail id has been hacked, then there are certain modules that can be helpful. Similar
to what we already discussed above, i.e., pwnedlist, there are other modules that can
give similar results:
recon/contacts-creds/pwnedlist
recon/contacts-creds/haveibeenpwned
recon/contacts-creds/should_change_password
CASE 4
For penetration testers it is also like hidden treasure because they can perform pen¬
etration testing without sending a single packet from their environment. The first
approach to do any penetration testing is information gathering. Let’s say we want
to perform a web application penetration testing then the first thing we want to enu¬
merate is what technology or server the site is running on. So that we can manually
search later for the publicly available exploits to exploit the same. In this case recon-
ng has a module to find the technology details for us.
Available module:
recon/domains-contacts/builtwith
Now after getting these details, generally, we look into the vulnerabilities avail¬
able in the net associated with that technology. But we can also look at the vulner¬
abilities associated with that domain. And it is possible by the use of punkspider
module. Punkspider uses a web scanner to scan the entire web and collect detailed
vulnerabilities and store it in its database, which can be used to directly search for the
available exposed vulnerabilities in a site.
Available modules:
recon/domains-vulnerabilities/punkspider
recon/domains-vulnerabilities/xsscd
Copyrighted material
PA121
Yahoo Pipes
Applications Places £ Q
•Cli - VMwarc Ptayor File ■» Jfiitual Machine ” b*)p ■»
root©kali: ~
Pile Edit View Search Terminal Help
rocon-rvg
JJJ JJJJ _/_/_/ JJJ J J J J JJJ
III I / II f J Jill
7 /_/ /_// / 7 ./ j j j jjj j j j j j jjj
r ii j j J J 11 J ii J /
7 7 /// "/// “///" 7 “7 7 “7 "///"
IJI _ _l_ LI -I I _ I _ I_I _o_ _ <__o_|_
IJIUUV I IIILV J J II <J I mu I ID I I _>(/_<_! JI I I v
/
Consulting | Research | Oevelopeient | Training
http://www.blackhiUsinfosec .co«
[r«con-ng v4.1.7. Tin To«*s (gLaNMaSt*R53)]
[57] Recon modules
[5] Reporting modules
[2] Exploitation modules
[2] Discovery modules
[1] Import modules
[recon-ng](default) > load punkspider
[recon-ngj[default][punkspider] > set SOURCE
SOURCE -> .com
[recon-ng](default][punkspider] > run
•] Host: www.i*r«..com
Attack: http://www.-ia, i . .. • -j-, Lraps/m.html?rt"nc6item«350701514735Shash»item51a77e6fef%2FSpt»Vintage_Unisex_T_ShirtSiAMD+l%3OI
*] Parameter: pt
*] Published: Thu Oct 31 03:27:26 GHT 2013
•] Category: BSQLI _
ED root(5)kali; - 13!
mm aoraQDm
FIGURE 6.21
Recon-ng PunkSpider in progress.
Now for a network penetration-testing perspective port scanning is also an
important thing. And this framework has modules to perform port scanning.
Available module:
rccon/netblocks-ports/census_2012
Apart from these there are direct exploitation modules available such as
exploitation/injection/command_injector
exploitation/injection/xpath_bruter
There are different modules for different functions; one major function among
them is the credential harvest. Still researchers are contributing in this project and
authors expanding the features. The ease of use and structural modules makes this
framework one of the popular tools for OSINT.
YAHOO PIPES
Yahoo Pipes is a unique piece of application from Yahoo which provides user the
freedom to select different information sources and enable some customized rules
according to own requirements to get filtered output. The best thing about the tool is
the cool GUI where a normal internet user can also create his/her own pipes to get
desired filtered information from different sources.
121
Copyrighted material
122 CHAPTER 6 OSINT Tools and Techniques
As we all arc OSINT enthusiasts, the only thing that matters to us is valid required
information. There are information available in different parts of the web. And there
are different sources to get information regularly. The problem is that how to dif¬
ferentiate the information we want from the numerous information provided by a
particular source. If we need to filter the required information manually from a set of
information then it requires a lot of manual effort. So to ease the process this applica¬
tion will help us a lot.
Requirements:
• A web browser
• Internet connectivity
• Yahoo Id
As it’s a web application we can access it from anywhere, and the lesser
dependency make it more usable along with its user friendly GUI. We can access
the application from below mentioned URL.
https://pipes.yahoo.com/
Visit this URL, login with your yahoo id and we are all set to use this application.
Another major plus point of this application is its well-formed documentation. Apart
from that we can find links to different tutorials (text as well as video) in the application
site itself describing how to start and other advance stuffs. Along with that for reference
purpose there are also links to popular pipes available. Let’s create our own pipe.
To create an own pipe we need to click on Create pipe button on the application.
It will redirect to http://pipcs.yahoo.com/pipes/pipc.cdit
In the right top corner we can find tabs like new, save, and properties. By default
there is no necessity to do anything with these tabs. As we are about to start creating
a new pipe, the things to be noted are that in the left side of the application we will
find different tabs and subtabs such as sources, user inputs, operators, URL, etc.
These are the tabs from where we can drag the modules to design the pipe.
Basically a pipe starts with a source or multiple sources. Then we need to create
some filters as per our requirements using operators, date, location, etc., and then
finally need to add an output to get the desired filtered information.
So to start with lets drag a source from the source sub tab, there arc different
options available such as Fetch CSV, Fetch data, Fetch feed, etc. Let’s fetch from
feeds as it’s a very good source of information. Drag the Fetch Feed sub tab to the
center of the application. When we drag anything to the center it will generate an
output box for us, where it will ask us to add the feed URL. Add any feed URL in my
case I am using http://feeads.bbci.co.uk/news/rss.xml7editionsint.
For the demo purpose I’ll show only single source example but we can also add
multiple sources for one pipe. Now it’s very important to create a proper filter, which
will give us the proper output. Now drag filter sub tab from Operators tab. By default
we will see “block,” “all,” “contains” keywords there and some blank spaces to fill.
Change that to “Permit,” keep the “all” as it is and add item description in first blank
space following with “contains” following with US. So our filter will provide us data
which contains only keyword “US” in its item description.
Copyrighted material
PA 123
Yahoo Pipes
4- 4 O D pipes.yahoo.com/pipes/pipeeditMd = lf7a72419df5df600accdd8828005da4
[.pipes
Layout b.
Test pipe (US)
andM CoitapMM
Pipe Saved 7unPipe...
; Feed Aulo O
[FeKftCSV
[Item Rulliter o'
[XPath Fetch Rage
;yol
•R.i
O hdplteoda bbo co uWiewwrss
F«ch Feed
[Find First Site Feed o'
► User input'.
► Operators
► Ufl
Perm* » items nut match all * or ine fononreing
O Rules
© item cescrcdon ►! Certain*
► Raeeie ’to elter military strategy’
F OS ’targets al-Shobab’s leader’
F Terror suspectj held in Saudi Xrehla
F Dm la ’ threatens’ W Africa harvests
F Ober banned across German/ by court
F Dethrooed beauty ctueen seels apology
F Ruetlen sex geckos die in orbit
F survivor singer Jamison dies aged (3
•17 wore...
Debugger Pipe Output (25 items)
FIGURE 6.22
Creating a Yahoo Pipe.
Now connect all the pipe points from sources box (Fetch Feed) to Filters box and
from Filter box to Pipe Output box. First save the pipes and then run the pipes to get
the output in a new tab.
C 2) pipes.yahoo.com/pipes/pipeinfo?_id= lf7a72419df5df600accdd8828005da4
□ imp □ bnpcrttd
Spipes
Home My Pipes Browse Discuss Documentation
You’re legged « e» ludhemhueMuhonKi? (logout)
f S— reft fo rflib—... TPl
□ Sudhanshu
fcroMe)
FdtProMa
Properties
Not published
0 clones
Bookmark / Share
■■ w n a dx
Tags id
america O
add new tag □
Sources <u
bbei co uk
feeds bbei.co.uk
Modules (2)
filter
fetch
Test pipe (US)
Click to add description
Pipe Web Address htip:<wpes.yahoo.comwpesjpipe.inro?jd=if7a724l9dt5d«00accdde823005da4 (edit)
☆ fcdt Source Delete Publish Clone
FIGURE 6.23
Yahoo Pipe result.
123
Copyrighted material
124 CHAPTER 6 OSINT Tools and Techniques
Wc can use it in many other scenarios like collecting images of a specific person
from flicker, filtering information by URL, date- and location-based and many
others. Explore this to create as customized pipes as possible. This tool provides
freedom to create pipes way beyond our imagination.
MALTEGO
There are many OSINT tools available in the market, but one tool stands out because
of its unique capabilities, Maltcgo.
Maltego is an OSINT application which provides a platform to not only extract data
but also represent that data in a format which is easy to understand as well as analyze.
It’s a one stop shop for most of the recon requirements during a pentest, what adds to its
already great functionalities is the feature which allows users to create custom add-ons
for the platform (which we will discuss later) depending upon the requirement.
Currently Maltego is available in two versions: commercial and community.
Commercial version is paid and we need a license key for it. The community version
however is free and we only need to register at the site of Pateva (creator of Maltego)
at this page: https://www.paterva.com/web6/community/maltego/index.php. Though
community version has some limitations in comparison to commercial version, like
limited amount of data extraction, no user support, etc., still it is good enough to feel
the power of this great tool. During this chapter we will be using the community
version for the demo purpose.
Let’s see how this tool works and what we can utilize it for.
First of all unlike most of the application software used for recon, Maltego pro¬
vides a GUI, which not only makes it easier to use but is a feature in itself, as the
data representation is what makes it stand out of the crowd. It basically works on
client-server architecture, which means that what we as a user get is a Maltego client
which interacts with a server to perform its operations.
Before going any further let’s understand the building blocks of Maltego as listed
below.
ENTITY
An entity is a piece of data which is taken as an input to extract further information.
Maltego is capable of taking a single entity or a group of entities as an input to extract
information. They are represented by icons over entity names. E.g. domain name
xyz.com represented by a globe-like icon
TRANSFORM
A transform is a piece of code which takes an entity (or a group of entities) as an
input and extracts data in the form of entity (or entities) based upon the relationship.
E.g. DomainToDNSNameSchema: this transform will try to test various name sche¬
mas against a domain (entity).
Copyrighted material
PA 125
Maltego
MACHINE
A machine is basically a set of transforms linked programmatically. A machine
is very useful in cases where the starting data (in form of an entity) and the
desired output data are not directly linked through a single transform but can be
reached through a series of transforms in a custom fashion. E.g. Footprint LI:
a transform which takes a domain as an input and generates various types of
information related to the organization such as e-mails. Autonomous System AS
number, etc.
First of all as mentioned above, we need to create an account for the community
version. Once we have an account we need to download it from https://www.patcr
va.com/web6/products/download3.php. The installation of the application is pretty
straightforward and the only requirement is Java. Once the installation is complete
we simply need to open the application and login using the credentials created during
the registration process.
Now as the installation and login processes are complete, let’s move on to the
interface of Maltego and understand how it works. Once we are logged into the
application it will provide us with some options to start with; we will be starting with
a blank graph so that we can understand the application from scratch. Now Maltego
will present a blank page with different options on top bar and a palette bar on the
left. This is the final interface we will be working on.
FIGURE 6.24
Maltego interface.
On the top left corner of the interface is the Maltego logo, clicking on which will
list down the options to create a new graph, save the graph, import/export configura¬
tions/entities, etc. The top bar in the interface presents five options, let’s discuss them
in detail:
125
Copyrighted material
126 CHAPTER 6 OSINT Tools and Techniques
INVESTIGATE
This is the first option in the top bar which provides basic functions such as cut,
copy, paste, search, link/entity selection, as well as addition. One important option
provided is Select by Type, this options comes in handy when there is a huge amount
of data present in the graph after running a different set of transforms or machines
and we are seeking a specific data type.
MANAGE
The Manage option basically deals with entity and transform management with some
other minor functions such as notes and different panel arrangements. Under the
Entities tab we get the options to create new entities, manage existing ones, and
their import/export; similarly the Transforms tab presents the options to discover new
transforms, manage existing ones, and create new local transforms (we will discuss
creating local transforms in later chapter.
FIGURE 6.25
Maltego Manage tab.
ORGANIZE
Once we are done with extracting the data, we need to set the arrangement of the
graph to make a better understanding of it, this is where the Organize option comes
in. Using the underlying options we can set the layout of the complete graph or
selected entities into different forms, such as Hierarchical, Circular, Block, etc. We
can also set the alignment of entities using the functions under “Align Selection” tab.
FIGURE 6.26
Maltego Organize tab.
Copyrighted material
PA 127
Maltego
MACHINES
As described before machines are an integral part of the application. Machines tab
provides the options to run a machine, stop all machines at once, create new machines
(which we will discuss in later chapter) and to manage existing ones.
COLLABORATION
This tab is used to utilize the feature introduced in late version of Maltego which
allows different users to work as a team. Using the underlying options users can share
their graphs with other users in real time as well as communicate through the chat
feature. This feature can be very helpful in Red Team environments.
The palette bar on the left is used to list all the different types of entities present
in Maltego. The listed entities are categorized according to their domain. Currently
Maltego provides 20+ entities by default.
Now as we arc familiar with the interface we can move on to the working of
Maltego.
First of all to start with Maltego we need a base entity. To bring an entity into the
graph we simply need to drag and drop the entity type we need to start with, from
the palette bar on the left. Once we have the entity in the graph, we can either double
click on the name of the entity to change its value to the value of our desire or double
click on the entity icon which pops up the details window where we can change data,
create note about that entity, attach an image, etc. One thing that we need to keep in
mind before going any further is to provide the entity value correctly depending upon
the entity type e.g. don’t provide a URL for an entity type “domain.”
Once we have set the value of an entity we need to right click on that entity and check
the transforms listed for that specific entity type. Under the “Run Transform” tab we
can see the “All Transforms” tab at the top, which will list all the transforms available
for the specific entity type; below that tab we can see different tabs which contains the
same transforms classified under different categories. The last tab is again “All Trans¬
forms,” but use this one carefully as it will execute all the listed transforms at once. This
will take up a lot of time and resources and might result into a huge amount of data that
we don’t desire.
Now let’s take up the example of a domain and run some transforms. To do
this simply drag and drop the domain entity under infrastructure from the palette
bar to the graph screen. Now double click on the label of the entity and change
it to let’s say google.com. Now right click on it and go to “All Transforms” and
select the “To DNS Name - NS (name server).” This transforms will find the
name server records of a domain. Once we select the transform we can see that
results start to populate on the graph screen. The progress bar at the bottom of
the interface shows if the transform is complete or is still running. Now we can
see that Maltego has found some name server (NS) records for the domain. We
can further select all the listed NS records and run a single transform on them.
To do this simply, select the region containing all the records and right click to
select a transform. Let’s run the transform “To Netblock [Blocks delegated to
127
Copyrighted material
128 CHAPTER 6 OSINT Tools and Techniques
this NS],” this transform will check if the NS record have any (reverse) DNS
netblocks delegated to them. In the graph window itself we can see at the top
that there are some options to try like Bubble View, which shows the graph as
a social network diagram with the entity size depending upon the number of
inbound and outbound edges; the Entity List as the name suggests lists down
all the entities in the graph and some others like freeze view, change layout to
Block, Hierarchical, Circular, etc.
FIGURE 6.27
Maltego Transform result (Domain to DNS Name - NS (name server)).
Similar to running a transform on an entity we can also run a machine. Let’s stick
to our example and take a domain entity with value google.com. Now we simply
need to right click on the entity, go to “Run Machines” tab and select a machine.
For this example let’s simply run the machine “Footprint LI.” This machine will
perform a basic footprint of the domain provided. Once this machine is executed
completely we can see that it displays a graph with different entities such as name
servers, IP addresses, websites, AS number, etc. Let’s move forward and see some
specific scenarios for data extraction.
DOMAIN TO WEBSITE IP ADDRESSES
Simply take a domain entity. Run the transform “To Website DNS [using Search
Engine].” It queries a search engine for websites and returns the response as
website entities. Now select all the website entities we got after running the
transform and run the transform “To IP Address [DNS].” This will simply run
a DNS query and get us the IP addresses for the websites. This sequence of
Copyrighted material
PA 129
Maltego
transforms can help us to get a fair understanding of the IP range owned by the
organization (owning the domain). We can also see which websites have multiple
IP addresses allocated to them. Simply changing the layout of the graph, to
say circular, can be helpful in getting a better understanding of this particular
infrastructure. Information like this is crucial for an in-depth pentest and can play
a game changing role.
E.g.: Domain = google.com
I I &*yu« • ? | iLfifil”
Q
ooo
o
o
FIGURE 6.28
Maltego Transform result (Domain to Website IP).
DOMAIN TO E-MAIL ADDRESS
There is a set of transforms for extracting e-mail address directly from a domain,
but for this example we will be following a different approach using metadata. Let’s
again take a domain entity and run all the transforms in the set “Files and Documents
from Domain.” As the name itself says, it will look for files listed in search engine for
the domain. Once we get a bunch of files, we can select them and run the transform
“Parse meta information.” It will extract the metadata from the listed files. Now let’s
run all the transforms in the set “Email addresses from person” on the entities of type
entity and provide the appropriate domain (domain we arc looking for in the e-mail
address) and a blank for additional terms. We can see the result from this final trans¬
form and compare it with the result of running the transform set for e-mail extraction
running directly on the domain and see how the results are different.
E.g.: Domain = paterva.com
129
Copyrighted material
130 CHAPTER 6 OSINT Tools and Techniques
FIGURE 6.29
Maltego Transform result (Domain to Email address).
PERSON TO WEBSITE
For this example we will be using a machine “Person - Email address.” Let’s take
an entity of type person and assign it a value “Andrew MacPherson” and run the
machine on this entity. The machine will start to enumerate associated e-mail IDs
using different transforms. Once it has completed running one set of transforms it
will provide us the option to move forward with selected entities, enumerated till
now. From the above example we know “andrew@punks.co.za” is a valid e-mail
address so we will go ahead with this specific entity only. What we get as an end
result is websites where this specific e-mail address occurs, by running the trans¬
form “To Website [using Search Engine]” (as a part of the machine).
The examples shown clearly demonstrate the power of this sophisticated tool.
Running a series of transforms or a machine can enumerate a lot of data which
can be very helpful during a pentest or a threat-modeling exercise. Extracting a
specific type of data from another data type can be done in different ways (using
different series of transforms). The best way to achieve what we want is to run a
series of transforms, eliminate the data we don’t need, then parallely run another
sequence of transforms to verify the data we have got. This exercise not only helps
to verify the credibility of the data we have got but sometimes also produce unique
revelation.
Maltego even allows to save the graph we have generated into a single file in
“mtgx” format for later usage or sharing. We can even import and export entities as
well as configuration. This feature allows us to carry our custom environment with
us and use it even on different machines.
Copyrighted material
PA131
Maltego
FIGURE 6.30
Saving Maltego results.
Apart from the prebuilt transforms Maltego allows us to create our own trans¬
forms. This feature allows us to customize the tool to extract data from various other
sources that we find useful for specific purpose, for example an API which allows to
get the company name from its phone number.
For custom transforms we have got two options:
Local transforms: These transforms are stored locally in the machine on which the
client is running. These type of transforms are very useful when we don’t need/want
others to run the transform or execute a task locally. They are simple to create and
deploy. Major drawback is that if we need to run it on multiple machines we need
install them separately on each one of them, and same is the case for updates.
TDS transforms: TDS stands for transform distribution server. It is a web application
which allows the distribution as well as management of transforms. The client sim¬
ply probes the TDS, which calls the transform scripts and presents the data back to
the client. Compared to local transforms they are easy to setup and update.
We will learn how to create transforms in later chapter.
So these are some of the tools which can play a very crucial part in an information¬
gathering exercise. Some of these are more focused on information security and some are
generic. The main takeaway here is that there are a bunch of tools out there which can
help us to extract relevant information within minutes and if used in a proper and efficient
manner these tools can play a game changing role in our data extraction process. There
is something for everyone, it’s just a matter of knowing how data is interconnected and
hence how one tiny bit of information may lead to the box of Pandora. In the next chapter
we will move forward and learn about the exciting world of metadata. We will deal with
topics like what is metadata, how is it useful, how to extract it, etc. We will also deal with
topics like how it can be used against us and how to prevent that from happening.
131
Copyrighted material
PA133
CHAPTER
Metadata
INFORMATION IN THIS CHAPTER
• Metadata
• Impact
• Metadata Extraction
• Data Leakage Protection (DLP)
INTRODUCTION
In the last few chapters we have learned extensively about how to find information
online. We learned about different platforms, different techniques to better utilize
these platforms, and also tools which can automate the process of data extraction. In
this chapter we will deal with a special kind of data, which is quite interesting but
usually gets ignored, the metadata.
Earlier metadata was a term mostly talked about in the field of information sci¬
ence domain only, but with the recent news circulation stating that National Security
Agency has been snooping metadata related to phone records of its citizens, it is
becoming a household name. Though still many people don’t understand exactly
what metadata is and how it can be used against them, let alone how to safeguard
themselves from an information security point of view.
The very basic definition of metadata is that it’s “data about data,” but some¬
times it’s a bit confusing. So for the understanding purpose we can say that meta¬
data is something which describes the content somehow but is not the part of the
content itself. For example in a video file the length of the video can be its metadata
as it describes how long the video will play, but it is not the part of the video itself.
Similarly for an image file, the make of the camera used to click that picture can be
its metadata or the date when the picture is taken as it tells us something related to
the picture, but is not actually the content of the picture. We all have encountered
this kind of data related to different files at some point of time. Metadata can be
anything, the name of the creator of the content, time of creation, reason of cre¬
ation, copyright information, etc.
The creation of metadata actually started long ago in libraries, when people
had information in the form of scrolls but no way to categorize them and find them
Hacking Web Intelligence. http://dx.doi.org/10.1016/B9784M2-8018fi7-5.00007-0 133
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
134 CHAPTER 7 Metadata
quickly when needed. Today in the digital age we still use metadata to categorize
files, search them, interconnect them, and much more. Most of the files that reside in
our computer systems have some kind of metadata. It is also one of the key compo¬
nents needed for the creation of the semantic web.
Metadata is very helpful in managing and organizing files and hence is used
extensively nowadays. Most of the times we don’t even make a distinction between
the actual content and its metadata. It is usually added to the file by the underlying
software which is used to create the file. For a picture it can be the camera that was
used to click it, for a doc file it can be the operating system used, for an audio file
it can be the recording device. Usually it is harmless as it does not reveal any data
which can be sensitive from information security perspective, or is it? We will see
soon in the following portion of this chapter.
There are huge number of places where metadata is used, from the files in
our systems to the websites on the internet. In this chapter we will mainly focus
on extracting metadata from places which are critical from information security
view point.
METADATA EXTRACTION TOOLS
Let’s discuss about some of the tool which can be used for the metadata extraction.
JEFFREY’S EXIF VIEWER
Exif (exchangeable image file format) is basically a standard used by devices which
handle images and audio files, such as video recorder, smartphone cameras etc., It
contains data like the image resolution, the camera used, color type, compression etc.
Most of the smartphones today contain a camera, a GPS (global positioning system)
device, and internet connectivity. In many of the smartphones when we click a pic¬
ture it automatically tracks our geolocation using the GPS device and embeds that
information into the picture just clicked. We being active on social networks share
these pictures with the whole world.
Jeffrey’s Exif Viewer is an online application (http://regex.info/exif.cgi)
which allows us to see this Exif data present in any image file. We can sim¬
ply upload it from our machine or provide the URL for the file. If an image con¬
tains the gcolocations, it will be presented in the form of coordinates. Exif
Viewer is based on the Exif Tool by Phil Harvey, which can be downloaded from
http://www.sno.phy.queensu.ca/~phil/exiftool/. It not only allows to read the Exif
data but also write it to the files. Exif Tool supports a huge list of different formats
like XMP, GFIF, ID3, etc., which are also listed on the page.
Copyrighted material
PA135
<-
e
0 regex.info/exif.cgi
Basic Image Information
Target file: \\T_20140922_10_40_53_Pro.jpg
Camera;
Nokia Lumia 630
Exposure:
Auto exposure, V8 sec, £2.4, ISO 1600
Flash:
Off, Did not fire
Date:
September 22, 2014 10:40:53AM (timezone not specified)
(12 hours, 4$ minutes, 17 seconds ago, assuming image timezone of 5V: hours ahead of GMT)
Location:
Latitude longitude: 28° 35’ 30.7" North, 77° 22' 17.5" East
( 28.591S63,77.37153$ )
Location guessed from coordinates:
E-88, E-block , Sector 52, New Okhla Industrial Development Area, Uttar
Pradesh 201307, India
Map via embedded coordinates at: Google, Yahoo, WildMapia, OpenStreetMap, Bing (also see
the Google Maps pane below)
Altitude: 176 meters (577 feet)
Timezone guess from earthtools.org: 5 : : hours ahead of GMT
File:
916 x 1,632 JPEG (1.5 megapixels)
Color
Encoding:
WARNING: Color space tagged as sRGB, without an embedded color profile.
Windows and Mac browsers and apps treat the colors randomly.
Images for the web are most widely viewable when in the sRGB color space and with
an embedded color profile. See my Introduction to Digital-Image Color Spaces for
more information.
FIGURE 7.1
Jeffrey's Exif Viewer
^ C:\Users\(o.o)\Downloads\Compressed\exiftool(-k).exe
Luminance
0 80 0
Measurement Observer
ClE 1931
Measurement Backing
0 0 0
Measurement Geometry
Unknown
Measurement Flare
0y.
Measurement Illuninant
D65
Media Black Point
0.01205 0.0125 0.01031
Red Matrix Column
0.43607 0.22249 0.01392
Red Tone Reproduction Curve
act >
<Binary data 2060 bytes, use -b option to extr
Technology
Cathode Ray Tube Display
Uiewing Cond Desc
Reference Uiewing Condition in IEC 61966-2-1
Media White Point
0.9642 1 0.82491
Profile Copyright
Copyright International Color Consortium, 2009
Chromatic Adaptation
7 -0.00925 0.01506 0.75179
: 1.04791 0.02293 -0.0502 0.0296 0.99046 -0.0170
Image Width
2048
fmage Height
1152
Encoding Process
Progressive DCT, Huffman coding
Bits Per Sample
8
Color Components
3
V Cb Cr Sub Sampling
VCbCr4:2:0 <2 2>
Image Size
— press any key —
2048x1152
FIGURE 7.2
Exit Tool interface.
Copyrighted material
136 CHAPTER 7 Metadata
Using the geolocation in the images we share, anyone can easily track where we
were exactly at the time of clicking it. This can be misused by people with ill inten¬
tions or stalkers. So we should be careful if we want to just share our pictures or
locations too.
EXIF SEARCH
We just discussed about the Exif and its power to geolocate the content. There is a
dedicated search engine which allows us to search through geotagged images, it’s
called Exif Search (http://www.exif-search.com/).
This search engine provides data about the images and pictures from all
over the internet. It contains a huge number of searchable Exif images from
different mobile devices. Being totally different from traditional image search
engines, which tend to just provide us the image as a result, Exif also provides
the metadata.
When we search in Exif Search, it searches the image and its information in
its own database and provides us the result. Currently it has more than 100million
images with metadata and it’s constantly updating its database.
This search engine provides user the freedom to search an image based on
location, date, and device type. It also allows us to sort the data based on these
date location or device type. Another unique feature of this search engine is that it
allows us to force the search engine to fetch us result for only images that contains
GPS data. There is a small check box available just below the search bar which
does the work for us.
<• __ _ _ c a- G oog tt _ p o ♦ # =
d Mmt Viutnt (Mtting St.rtr.l Xuqijrstfd Xrfn Wtti SJk. GjAtty
HOME I ABOUT US I SERVICES I DEVICE MAP
exif Photo Search
Search Photos on the Internet by location, date, and device type
_ 3 ','"’ 13
J •.■■■.;* UPS j
Recently Found Photos: (104,017,893)
& W *1 m ^
FIGURE 7.3
Exif-search.com interface.
Copyrighted material
PA 137
Metadata extraction tools
It also supports a huge number of devices. The list can be found
http://www.exif-search.com/devices.php, some of them are Canon, Nikon, Apple
and Fujifilm etc.
<■ i i www.exif-search.com/index.php?q=niagra+falls&SD=2013-09-01&ED=2014-09*17&x=40&y=22&geo=l#.VCBQjlcwCzE
Most Visited • Getting Started [..} Suggested Sites • Web Slice Gallery
Date:
Location:
Device:
Details:
Big Oak Flat Rood. Yosemite.NationalPark. YOSEMITE
NATIONALPARK CA. USA
MKON.CORTO^TON NWON D700
Drenched Last year Wilhe, Will and I got our first great
moonbow photo while on top of the Upper Yosemite Falls
trail Thanks to some professors in Texas just about anyone
can find out when the moon bows in Yosemite will occur.
Trying to avoid the hoards of crowds at the Sentinal Bndge
parking lot we decided to try to find a more unique
moonbow and from Y e * r Willie and I
had seen a number of timescape videos, most notably Steve
Bumgardner's official video for the Yosemite Conservancy, in
which moonbows were photographed at Cascade Falls We
knew we had to try this! ..I spent a lot of time trying to figure
out how to get to the proper location to shoot a moonbow at
Cascade Falls. You need to get high enough and east enough
to get around a jut in the rocks (you can see it here, where
the water flows over, blocking the top of the falls) to get the
proper angle to see the top of the falls, which has a really nice
‘S' curve to it. I used Google Earth and a number of other
peoples images to get a vague idea of what we had to do. We
found out that Steve traveled up from the bottom (along
highway CA-140) but 1 thought you might be able to drop in
from the top. When I amved at Yosemite on Saturday I quickly
ruled out the top-down approach I hopped in the car. drove
down to the bottom and started on up. After an hour of
completely sweating, super steep climbing, and searching high
FIGURE 7.4
Exif-search.com sample search result.
ivMeta
Similar to images, video files can also contain GPS coordinates in their metadata.
ivMeta is a tool created by Robin Wood (http://digi.nmja/projects/ivmeta.php) which
allows us to extract data such as software version, date, GPS coordinates, model
number from iPhone videos. iPhone is one of the most popular smartphone available
and has a huge fan base. With more than a million users, their activity to show the
uniqueness of the iPhone standard makes them more vulnerable to metadata extrac¬
tion. No doubt on the camera quality of the devices and the unique apps to make the
pictures and videos look more trendy, iPhone users upload lots of such data content
everyday in different social networking sites. Though there is an option available on
the device to deactivate geotagging, the by-default setting and the use of GPS allows
to create metadata about any image or video taken. In this case this tool comes handy
137
Copyrighted material
Metadata extraction tools 139
extraction to another level by allowing all the Microsoft portable executables. It also
supports torrent files, which are the easy solution to most of the data sharing require¬
ments. So torrent metadata extraction is definitely one of its unique feature. Who even
would thought of extracting metadata from ttf or true type fonts, but yes this tool also
supports ttf format. There are many other formats it supports, we can get the details
from the following url: https://bitbucket.org/haypo/hachoir/wiki/hachoir-metadata.
This hachoir-metadata is basically a command-line tool, and by default it’s very
verbose. That means running the same without any switches, it provides lots of information.
# hachoir-metadata xyz.png
We can also run this tool with multiple and different file formats at a time to get
the desired result.
# hachoir-metadata xyz.png abc.mp3 ppp.flv
When we need only mime details we can use
# hachoir-metadata —mime xyz.png abc.mp3 ppp.flv
When we need little more information other than mime we can use -type switch
# hachoir-metadata —type xyz.png abc.mp3 ppp.flv
for exploring the tool for other options we can use
# hachoir-metadata —help
FOCA
On a daily basis we work with a huge number of files such as DOC, PPT, PDF, etc.
Sometimes we create them, sometimes edit, and sometimes just read through. Apart
from the data we type into these files, metadata is also added to them. To a normal
user this data might seem harmless, but actually it can reveal a lot of sensitive infor¬
mation about the system used to create it.
Most of the organizations today have online presence in the form of websites
and social profiles. Apart from the web pages, organizations also use different files
to share information with general public and these files may contain this metadata.
In Chapter 5 we discussed how we can utilize search engines to find the files that are
listed on a websites (E.g. In Google: “site:xyzorg.com filetype:pdf”). So once we
have listed all these files, we simply need to download them and use a tool which can
extract metadata from them.
FOCA is a tool which does this complete process for us. Though FOCA means seal
in Spanish, the tool stands for ‘Fingerprinting Organizations with Collected Archives’.
It can be downloaded from https://www.elevenpaths.com/labstools/foca/index.html.
After downloading the zip file, simply extract it and execute the application file
inside the bin folder.
To use FOCA we simply need to create a new project, provide it with a name and
the domain to scan. Once this is saved as a project file, FOCA allows us to choose
Copyrighted material
140 CHAPTER 7 Metadata
the search engines and the file extensions that we need to search for. After that we
can simply start by clicking on the button “Search All.” Once we click on this button
FOCA will start a search for the ticked file types on the mentioned domain, using
different search engines. Once this search is complete it will display the list of all the
documents found, their type, URL, size, etc.
Now we have the list of the documents present on the domain. Next thing we need
to do is download the file(s) by right clicking on any one and choosing the option
Download/Download All. Once the download is complete the file(s) is/are ready for
inspection. So now we need to right click on the file(s) and click on the Extract Meta¬
data option. Once this is complete we can see that under the option Metadata at the
right-hand side bar FOCA has listed all the information extracted from the document(s).
This information might contain the username of the system used to create the
file, the exact version of the software application used to create it, system path, and
much more which can be very helpful for an attacker. Though metadata extraction is
not the only functionality provided by FOCA, we can also use to it to identify vul¬
nerabilities, perform network analysis, backups search and much more information
gathering, the most prevalent functionality.
FIGURE 7.6
FOCA result.
METAG00FIL
Similar to FOCA, Metagoofil is yet another tool to extract metadata from documents
which are available online. Metagoofil is basically a Python based command line too.
Copyrighted material
PA141
Metadata extraction tools
The tool can be downloaded from https://code.google.eom/p/metagoofil/downloads/
list. Using this tool is fairly easy; there are a few simple switches that can be used to
perform the task.
The list of the options is as following:
Metagoofil options
-d: domain to search
-t: filetype to download (pdf, doc, xls, ppt, odp, ods, doex, xlsx, pptx)
-1: limit of results to search (default 200)
-h: work with documents in directory (use “yes” for local analysis)
-n: limit of files to download
-o: working directory (location to save downloaded files)
-f: output file
We can provide the queries such as the one mentioned below to run a scan on
target domain and get the result in the form of a HTML file, which can be easily read
in any browser:
metagoofil -d example.com -t doc,pdf -1 100 -n 7 -o /root/Desktop/meta -f /root/
Desktop/meta/result.html
root@kali: ~
File Edit View Search Terminal Help
metagoofil
* *.*4.****.* *******4.**** *****4.4.4.***********************
/\/\ _ || ___ _ / _(_) |
/ \ / _ \ _/ 1/ 1/ _ \ / _ M LI I I
/ /\/\ \ / II (_l I U I <J I (J I _lll
V \A_|\_\_,_|\_, |\_/ \_/|J LLI
I_/
Metagoofil Ver 2.2
Christian Martorella
Edge-Sec urity.com
c ma rt o rella_at edge-sec urity.com
Usage: metagoofil options
•d: domain to search
-t: filetype to download (pdf,doc,xls,ppt,odp,ods,doex,xlsx,pptx)
-l: limit of results to search (default 200)
-h: work with documents in directory (use "yes“ for local analysis
-n: limit of files to download
-o: working directory (location to save downloaded files)
-f: output file
mm H0C3GDH
Examples:
metagoofil.py -d apple.com -t doc,pdf -l 200 -n 50 -o applefiles -f results.ht
ml
metagoofil.py -h yes -o applefiles -f results.html (local dir analysis)
:-# |
FIGURE 7.7
Metagoofil interface.
Similar to FOCA, Metagoofil also performs search for documents using search
engine and downloads them locally to perform metadata extraction using various
141
Copyrighted material
PA 142
142 CHAPTER 7 Metadata
Python libraries. Once the extraction process is complete the results are simply dis¬
played in the console. As mentioned above these results can also be saved as a HTML
file for future reference using the -f switch.
[( iMetagoofil results
§ filey//root/Oesktop/OataMine/results.html
▼ C [a* Google
flBackTrack Linux fjoffensive Security QExploit-DB ^Aircrack-ng gjSomaFM
Results fon amitj.edu/placeiiient
50%
40%
11 ...
Unnumei Software Ema4j PafiVSefver*
User names found:
• amitp
• ankurt
• siddharth
• fiyanjanikb
Software versions found:
• Microsoft Office Word
• Acrobat Distiller 5.0.5 (Windows)
• PScript5.dll Version 5.2
• JiyOpcnOfricc.org 3.1
• J>y Writer
FIGURE 7.8
Metagoofil result.
Similarly there are other tools which can be used for metadata extraction from various different
files, some of these arc listed below:
• Mcdialnfo—audio and video files (http://mediaarea.net/en/McdiaInfo)
• Gspot—video files (http://gspot.headbands.com/)
• Videoinspector—video files (http://www.kcsoftwares.eom/7vtb#help)
• SWF Investigator—SWF/flash files http://labs.adobe.com/downloads/swfinvestigator.html)
• Audacity—audio files (http://audacity.sourceforge.net/)
IMPACT
The information collected using metadata extraction can be handy and used to craft
many different attacks on the victim by stalkers, people with wrong motivations
and even government organizations. The real-life scenario can be worse than what
Copyrighted material
PA 143
Impact
we can expect. As information collected from the above process provide victims’
device details, area of interest, and sometime geolocation also, the information
such as username, software used, operating system etc. is also very critical for an
attacker. This information can be used against the victim using simple methods
such as social engineering or to exploit any device-specific vulnerability that harms
the victim personally in real life as it also provides exact location where the victim
generally spends time.
And all those things are possible just because of some data that mostly nobody
cares or some might not even realize its existence, even if they do, then also most of
them are not aware where this data can lead to and how it makes their real as well as
virtual life vulnerable.
As we have seen that how much critical information is revealed through the
documents and files uploaded without us realizing it and what are possibilities of
turning this data as critical information against a victim and use them as an attack
vector. Now there must be a way to stop this, and it’s called as data leakage protec¬
tion (DLP).
SEARCH DIGGITY
In the last chapter we learned a about advanced search features of this interesting
tool. For a quick review Search Diggity is tool by Bishop Fox which has a huge set
of options and a large database of queries for various search engines which allow us
to gather compromising information related to our target. But in this chapter we arc
most interested on one of the specific tab of this tool and that is DLP.
There are wide numbers of options to choose from side bar of DLP tab in search
Diggity. Some of the options are credit card, bank account number, passwords,
sensitive files, etc.
This DLP tab generally is a dependent one. We cannot directly use this. First we
have to run some search queries on a domain of our interest then select and down¬
load all the files those are found after completion of that search query than provide
the path in DLP tab to check whether any sensitive data is exposed to public for that
particular domain or not. To do so we can choose either Google tab or Bing tab which
means either Google search engine or Bing and in that have to select “DLPDiggity
initial” option to start searching for backup, config files, financial details, database
details, logs and other files such as text or word document, and many more from
that domain of our interest. Though there is a option to only choose some specific
suboptions from “DLPDiggity initial” option, from demo prospective let’s search
for all the suboptions. After completion of the query we will get all the available
files in tabular format in a result section of this tool. Select all the files that we got
and download the same. It will save all the files in default path and in a folder called
DiggityDownloads.
143
Copyrighted material
PA 145
Metadata removal/DLP tools
The result sometimes might show scary results such as credit card numbers, bunch of
passwords, etc. That is the power of this tool. But our main focus is not about discovery
of sensitive files but DLP. So get all the details from the tool’s final result. The result
shows in an easy and understandable manner, in what page or document what data is
available. So that the domain owner can remove or encrypt the same to avoid data loss.
METADATA REMOVAL/DLP TOOLS
As DLP is an important method to avoid data loss. The above example is quite generic
to get us some idea about how DLP works. Now as per our topic we are more interested
on metadata removal. So there are also different tools available to remove metadata
or we can also say them as metadata DLP tools. Some of those are mentioned below.
METASHIELD PROTECTOR
MetaShield Protector is a solution which helps to prevent data loss through office
documents published on the website. It is installed and integrated at web server level
of the website. The only limitation of this is that, it is only available for IIS web
server. Other than that It supports a wide range of office documents. Some of the pop¬
ular file types are ppt, doc, xls, pptx, doex, xlsx, jpeg, pdf, etc. On a request for any of
these document types, it cleans it on the fly and then delivers it. MetaShield Protec¬
tor can be found at https://www.elevenpaths.com/services/html_en/metashield.html.
The tool is available at https://www.elevenpaths.com/labstools/emetrules/index.htmL
MAT
MAT or metadata anonymization toolkit is a graphical user interface tool which also
helps to remove metadata from different types of files. It is developed in Python and
utilizes hachoir library for the purpose. As earlier we discussed a bit about hachoir
Python library and one of its project in hachoir-metadata portion, this is another
project based on the same library. The details regarding the same can be found here
http s: //mat .bourn .org/.
The best thing about MAT is that it is open source and supports a wide range of
file extensions such as png, jpeg, doex, pptx, xlsx, pdf, tar, mp3, torrent etc.
MyDLP
It is a product by Comodo which also provides wide range of security product and
services. MyDLP is an one stop solution for different potential data leak areas. In
an organization not only documents but also emails, USB devices, and other similar
devices are potential source of data leak. And in this case it allows an organization to
easily deploy and configure this solution to monitor, inspect, and prevent all the out¬
going critical data. The details of MyDLP can be found here, http://www.mydlp.com.
145
Copyrighted material
PA 147
Online Anonymity
8
INFORMATION IN THIS CHAPTER
• Anonymity
• Online anonymity
• Proxy
• Virtual private network
• Anonymous network
ANONYMITY
Anonymity, the basic definition of this term is “being without a name.” Simply
understood someone is anonymous if his/her identity is not known. Psychologically
speaking, being anonymous may be perceived as a reduction in the accountability
for the actions performed by the person. Anonymity is also associated with privacy
as sometimes it is desirable not to have a direct link with a specific entity, though
sometimes it is required by law to present an identity before and/or during an action
is performed. In the physical world we have different forms of identification, such
as Social Security Number (SSN), driving license, passport etc., which are widely
acceptable.
ONLINE ANONYMITY
In the virtual space we do not have any concrete form of ID verification system. We
usually use pseudonyms to make a statement. These pseudonyms arc usually are
not related to our actual identity and hence provide a sense of anonymity. But the
anonymity present on the internet is not complete. Online we may not be identified
by our name, SSN, or passport number, but we do reveal our external IP address.
This IP address can be used to track back to the computer used. Also on some plat¬
forms like social network websites we create a virtual identification as they relate to
our relationships in physical world. Some websites have also started to ask users to
present some form of identification or information which can be related directly to a
person, in the name of security. So basically we are not completely anonymous in the
Hacking Web Intelligence. httpV/dx.dol.org/10.1016/B978-0-12-801867-5.00008-2 147
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
148 CHAPTER 8 Online Anonymity
cyber space. Usually we do reveal some information which might be used to trace the
machine and/or the person.
WHY DO WE NEED TO BE ANONYMOUS
There are many reasons to be anonymous. Different people have different reasons
for that some may want to be anonymous due to their work demands such as those
who are into cyber investigation, journalism, and some might want to be anonymous
because of their concern of their privacy etc. There are times when we want to protest
on something good but doing that openly might create some problems so we want to
be anonymous. As we say in physical life, people who do bad things like a criminal
after doing a crime want to go underground the same way in virtual life or in the
internet. Cyber-criminals and hackers wanted to be anonymous.
Being anonymous is just a choice. It does not always need a reason. It’s just a
state to be in virtual life. It’s a virtual lifestyle and while some want to enjoy the same
and others might be forced to be. Similar to the physical world we do have a need or
desire to stay anonymous on the internet. It may just be that we are concerned about
our privacy, we want to make a statement but won’t do it with our true identity, we
need to report something to someone without getting directly involved, communi¬
cate sensitive information, or simply want to be a stranger to strangers (anonymous
forums, chat rooms etc.). Apart from the mentioned reason, we may simply want
to bypass a restriction put up by the authority (c.g., college Wi-Fi) to visit certain
portions of the web. The motivation behind it can be anything, but a requirement is
surely there.
People might get confused of being anonymous that means just hiding the iden¬
tity. It can also about hiding what you are doing and what you want to be. A simple
example can help us to understand this. Let’s say we wanted to buy something and
we visited an e-commerce site to buy it. We liked the product but due to some reasons
we did not buy that. But as we were surfing normally, we may found advertisement of
the same product all over the internet. It’s just a marketing policy for the e-commerce
giants by tracking a user’s cookies to understand his/her likes and dislikes and post
the advertisement according to that.
Some might like this and some might not. It’s not just about somebody is
monitoring on what are you doing in the internet but also about flooding adds
about similar things to lure us to buy. To avoid such scenarios also people might
prefer to browse anonymous. For a quick revision, there are private browsing
options available in most of the browsers and there are specific anonymous brows¬
ers available that do this work for us.
In this chapter we will deal with different ways to stay anonymous online. 100%
anonymity cannot be guaranteed on the internet, still with the tools and techniques
that will be mentioned in this chapter, we can hide our identity up to a reasonable
level.
Copyrighted material
PA 149
Ways to be anonymous
WAYS TO BE ANONYMOUS
There are many ways to be anonymous and there are many aspects of being anon¬
ymous. Some might focus on the personal details to be hidden such as in social
networking sites by using aliases, generic information or fake information, generic
e-mail id, and other details. Some might want to be anonymous while browsing so
that nobody can track what resource they are looking into. Some might want to hide
their virtual identity address such as IP address etc.
There are different ways to achieve the above conditions. But the major and popu¬
lar solutions available are either proxy or virtual private network (VPN). Though
there arc other methods to be anonymous but still these two arc widely used and we
will focus on these majorly in this chapter.
PROXY
Proxy is a word generally used for doing stuffs on behalf of someone or something.
Similarly in technology, proxy can be treated as an intermediate solution that for¬
wards the request sent by the source to the destination and collects response from the
destination and sends it to the source again.
It is one of the widely used solutions used for anonymity. The only reason to use
proxy is to hide the IP address. There are different proxy solutions available such as
web proxy, proxy software etc. Basically all the solutions work on a basic principle
to redirect traffic to the destination from some other IP address. The process might
differ from solution to solution but the bottom line remains the same.
Though proxy can be used for many other purposes just apart from being anony¬
mous, we will focus only the anonymity as the chapter demands the same.
Before focusing into the very deep technical aspects of proxy let’s look into some
work around to be anonymous. As in earlier chapters we learned how to use search
engines efficiently and power searching. Now it’s time to look into how a search
engine can be used as a proxy to provide anonymity.
As Google is a popular search engine it can also be used as proxy with its feature
called as Google Translate. Google provides its services in many countries apart
from the English speaking ones and it also supports multiple languages. The Google
Translate option allows a user to read web content in any other language a user
wants. For a generic example, a non-English content can be translated to English
and vice versa. So this feature allows a user to use Google server to forward the
request and collect the response on his/her behalf, which is the basic fundamental
of a proxy.
Now for testing the same, first we will look into our own IP address using a site
called http://whatismyipaddress.com/ and later use Google translator to check the
same site. The work of this site is to tell the IP address used to send the request to
the site. If for the normal browsing and browsing through Google Translate the IP
address differs, it means we achieved anonymity using Google Translate.
149
Copyrighted material
PA151
Ways to be anonymous
«■ © aF
• googleusercontent.com- ^ : M-. - ; ;
v=t8trurl=translat<
[Pj. Most Visited ^ Getting Started Suggested Sites j Web Slice Gallery
ft)- Trace Email
Mn| Track down the etna!
^ Geographical location and origin of
year you received
^ Hide IP
J Learn how to use a high-tech
"middleman” to shield your real
P address on the Internet.
^ \f ~ VPN Comparison
l n . Comoare ton rated nrov
1 n . Compare top rated providers VPN
t] Service That Needs and meet your
budget
Blacklist Check
) Have you been biackksted
Because of the P address you
use? Check to see here.
Speed Test
Is your Internet connection up to
W6pi speed’Fnd out for free with a
quick c&ck.
Your IP Address Is:
66 . 249 . 82.121
Your IP Details:
ISP: Google
Services: Confirmed Proxv Server
City:Deoghar
Region: Jharkhand
Country: India
Do not want this Known? Hide vour IP details
i— -«-i7ftft.cn - ? s —
Click for
^ '-V.-o "more details
<T
Varafta%i ew Bh»#*>ur °
® G&rt © . * * ^ ^ •
9 Jc^***-
■ Bangladeshi
•F» mapquest
crt*niSOW
«r\ •
@2014 MapQucrt Some dm @2014 Natural Earth
FIGURE 8.3
Page opened inside Google Translate.
We can see from the above image that the IP addresses of direct browsing and of
browsing using Google Translate are different. Thus it is proved that we can use Google
Translate as proxy server to serve our purpose. In many cases it will work fine. Though it’s
just a work around it’s very simple and effective. In terms of full anonymity it might not
be helpful but still we may use this method where we need a quick anonymity solution.
PROXY IN TERMS OF ANONYMITY
As we came across one example where we can use search engine feature as proxy.
But the point to be considered is anonymity. There are different levels of anonymity
based on different proxy solutions. Some proxies just hide our details but keeping the
same in their logs, and sometime some proxies can be detected as proxy by the server
and some might not. That’s not the best solution if you want full anonymity. There
are some solutions available which cannot be detected as proxy by the destination
server and also delete all the user details the time user ends the session. Those are the
best solutions for full anonymity. It all depends on our requirement to choose what
service or what kind of proxy we want to use because fully anonymous proxy might
charge the user some amount to use the solution.
TYPES OF PROXY SOLUTIONS
Now there are different types of proxy solutions available some are based on ano¬
nymity and also based on its type such as whether application-based or web-based.
So let’s start exploring some of the available options in application-based proxy.
151
Copyrighted material
PA153
Ways to be anonymous
■ 9 - UltraSurf 14.04
l°i Ma- T
m
Reset
Option
Home
UltraSurf
Preferred
r
@ 100. os
w
O 100.0S
r
@ 100 . os
Help
x
Exit
Speed
Listen: 127.0.0.1:9666
Status: Successfully connected to server!
Feedback
FIGURE 8.4
UltraSurf interface.
A small drawback about this tool is that this tool supports only Windows. And
another drawback is that the IP-checking solutions detect it as proxy server. But
as we discussed earlier, this can be used in different other conditions based on our
requirements and it’s easy to use. Just download, run, and browse anonymously.
JonDo
JonDo previously known as JAP is a proxy tool available at https://anonymous-
proxy-servers.net/en/jondo.html.
It is available for wide range of operating systems such as Windows, Mac, for dif¬
ferent flavors of Linux, and also for Android mobile. The full-fledged documentation
of how to install and use makes it very essential as a proxy solution. Different proxy
solutions come up with different types. It also provides one of its type for Firefox
anonymous browsing known as JonDoFox.
Before exploring JonDo let’s first look into the Firefox anonymous browsing
solution i.e., JonDoFox. It can be found at https://anonymous-proxy-servers.net/
en/jondofox.html.
As JonDo, JonDoFox is also available for different operating systems such as
Windows, Mac, and Linux. User can download as per his/her operating system from
the above URL. The documentation of how to install is also available just next to the
download link. But let’s download and install while we discuss more about the same.
Windows users will get JonDoFox.paf after downloading. After installing the
same it will create a Firefox profile in name of JonDoFox. If user selects the same,
the profile consists of many Firefox addons such as cookie manager, adblocker, etc.,
which will come to act. But to use it for full anonymity user needs to install certain
dependent softwares such as Tor etc.
153
Copyrighted material
154 CHAPTER 8 Online Anonymity
It’s good to use JonDoFox but user has to install all the dependent softwares once
after installing the same. Some might not love to do so but still this is a great solution
to browse anonymously.
Like JonDoFox, JonDo can also be downloaded from the above URL. It will give
you the installer. Windows user will get an exe file “JonDoSctuup.paf” after down¬
loading. The installation can be done for the operating system we are using and also
for the portable version that can be taken away using the USB drive. User needs to
choose according to his/her requirements. The only dependency of this software is
JAVA. But as earlier we discussed how to install the same we are not going to touch
that here again and by the way while installing this software it also installs JAVA, if
it won’t find the compatible version available in the operating system. Once JonDo is
installed, we can double click on its desktop icon to open the same. By default after
installation it creates a desktop icon and enables it to start in Windows startup.
JonDo only provides full anonymity and fast connection to premium users. But
we still can use the same. But first time we need to activate it with its free code. Test
coupon can be found at https://shop.anonymous-proxy-servers.net/bin/testcoupon?
lang=en but we need to provide our e-mail address to get it.
FIGURE 8.5
JonDo interface.
After providing the e-mail address we will get a link in our e-mail id. Visit the
link to get the free code. Once we get the free code, put it in the software to complete
the installation process.
Copyrighted material
158 CHAPTER 8 Online Anonymity
As we discussed the pros and cons of this service still it’s very good proxy solu¬
tion for anonymous browsing and there are some other features like send e-mail and
check e-ncws available. But as we are more focused on hiding our details on brows¬
ing, right now we will conclude this here itself.
Zend2
It is also a web-based proxy solution unlike anonymouse.org, which only supports
http protocol. So user cannot use anonymouse.org to browse popular sites such as
Facebook and YouTube as these sites force to use https connection.
https://www.zend2.com/ has no restrictions on https-enabled sites or technically
SSL-enabled sites. It allows user to surf both http and https sites. So user can use the
same to check his/her e-mails also.
4" fl^ttps^^vA^endiconJ ~ C 0 • i«nd2
Most Visited 4ft Getting Suited Suggested Sites Web Slice Gallery
frotatcjoin of S^><*<*cdb>
r-1
HOME ABOUT US PRIVACY POLICY CONTACT U
>
Free Membership
Insight Survey
Report
$$##4 jd
(fiJiic! Apricot ^G
SURF >
(Options)
[3 Encrypt URL □ Encrypt Page 13 Allow Cookies
|?1 Remove Scripts (7) Remove Objects
FIGURE 8.11
Zend2 homepage.
Apart from that for two popular web resources such as Facebook and YouTube, it
also provides special GUI to use. For Facebook: https://zend2.com/faccbook-proxy/.
For YouTube: https://zend2.com/youtube-proxy/. The YouTube proxy page contains
instructions how to unblock YouTube if it’s blocked in your school, college, office,
or by the ISP while the Facebook proxy page contains general information how this
web proxy works.
Copyrighted material
PA 162
FIGURE 8.14
CyberGhost interface.
The interface of the application is pretty simple. We can make the configuration
changes and also upgrade to a paid account from it. On the home screen the appli¬
cation will display our current IP address with the location in map. To start using
the service we simply need to click on the power button icon. Once we click on it
CyberGhost will initiate a connection to one of the servers and will display a new
location once the connection is made.
FIGURE 8.15
CyberGhost in action.
In the settings menu of CyberGhost we can also make changes such as Privacy
Control and Proxy which further allows us to hide our identity while connected online.
Copyrighted material
PA 163
Ways to be anonymous
Hideman
Similar to CyberGhost, Hideman is another application which allows us to conceal
our identity. The client for the application can be downloaded from https://www.hi
deman.net/. Like CyberGhost, in Hideman also we don’t need to make much con¬
figuration changes before using it, simply install the application and we are good to
go. Once the application is installed, it provides a small graphical interface, which
displays our IP and location. Below that there is an option where we can choose the
country of connection, which is set to “Automatically” by default. Once this is done
we simply need to click on the Connect button and the connection will be initiated.
Currently Hideman provides free usage for 5 hours a week.
FIGURE 8.16
Hideman interface.
163
Copyrighted material
164 CHAPTER 8 Online Anonymity
Apart from the mentioned services there are also many other ways to utilize VPN
for anonymity. Some service providers provide VPN credentials which can be con¬
figured into any VPN client and can be used, others provide their own client as well
as the credentials.
ANONYMOUS NETWORKS
An anonymous network is a bit different in the way it operates. In this the traffic is
routed through a number of different users who have created a network of their own
inside the internet. Usually the users of the network are the participants and they help
each other to relay the traffic. The network is built in a way that the source and the
destination never communicate directly to each other, but the communication is done
in multiple hops through the participating nodes and hence anonymity is achieved.
The Onion Router
Tor stands for “The Onion Router.” It is one of most popular and widely used meth¬
ods to stay anonymous online. It is basically a software and an open network which
allows its users to access the web anonymously. It started as a US navy research
project and now is run by a nonprofit organization. The user simply needs to down¬
load and install the Tor application and start it. The application starts a local SOCKS
proxy which then connects to the Tor network.
Tor uses layered encryption over bidirectional tunnels. What this means is that
once the user is connected to the Tor network, he/she sends out the data packet with
three layers of encryption (default configuration) to the entry node of the Tor net¬
work. Now this node removes the uppermost layer of the encryption as it has the key
for that only but the data packet is still encrypted, so this node knows the sender but
not the data. Now the data packet moves to second node which similarly removes
the current uppermost encryption layer as it has the key for that only, but this node
does not know the data as well as the original sender. The packet further moves to
the next node of the Tor network, which removes the last encryption layer using the
key which works for that layer only. Now this last node, also called the exit node has
the data packet in its raw form (no encryption) so it knows what the data is, but it is
not aware who the actual sender of the data is. This raw data packet is then further
sent to public internet to the desired receiver, without revealing the original sender.
As already stated this is bidirectional so the sender can also receive the response in
similar fashion. One thing that needs to be mentioned here is that the nodes of the Tor
network between which the data packet hops are choosen randomly, once the user
wants to access another site, the Tor client will choose another random path between
the nodes in the Tor network. This complete process is termed as onion routing.
So Tor is pretty good at what it does and we just learned how it works. But as we
need to use different nodes (relay points) and there is also cryptographic functions
involved, which makes it pretty slow. Apart from this we are also trusting the exit
nodes with the data (they can see the raw packet).
Tor is available in many different forms, as a browser bundle, as a complete
OS package etc. The browser bundle is the recommended one as it is completely
Copyrighted material
Ways to be anonymous 165
preconfigured, very easy to use, and comes with additional settings which helps to
keep the user safe and anonymous. The browser bundle is basically a portable Firefox
browser with Tor configured. It also contains some additional addons such as HTTPS
Everywhere, NoScript. Tor browser can be downloaded from https://www.torproj
ect.org/download/download-easy.html.en. Once it is downloaded we simply need
to execute the exe file and it will extract it in the mentioned directory. After this
has been completed we simply need to execute the “Start Tor Browser” application,
which is a portable Firefox browser with Tor configured. It will present us with the
choice to connect directly to the Tor network or configure it before going forward.
General users simply need to click on the Connect button, in case the network we are
connected to requires proxy or other advanced settings, we can click on the Config¬
ure button to make these settings first. Once we are good to go, we can connect to the
network and the Tor browser will open up as soon as the connection is made. Apart
from this other packages which allow us to run bridge, relay, and exit nodes can be
downloaded from https://www.torproject.org/download/download.html.en.
FIGURE 8.17
Tor Browser.
Apart from allowing users to surf the web anonymously, Tor also provides another
interesting service, about which we will learn in next chapter.
Invisible Internet Project
I2P stands for Invisible Internet Project. Similar to Tor, I2P is also an anonymous
network. Like any network there are multiple nodes in this network, which are used
to pass the data packets. As opposed to Tor, I2P is more focused on internal services.
Copyrighted material
Ways to be anonymous 167
Similar to Tor, I2P also provides other services which we will discuss in next chapter.
Browser addons like FoxyProxy (http://getfoxyproxy.org/) can be used to make
the proxy changes easily in the browser.
The individual techniques we have discussed in this chapter can also be chained
together to make it more difficult to get traced. For example, we can connect to a
VPN-based proxy server, further configure it to connect to another proxy server in
another country, and then use a web-based proxy to access a website. In this case the
web server will get the IP address of the web-based proxy server used to connect to
it, and it will get the IP address of the proxy server we connected through the VPN;
we can also increase the length of this chain by connecting one proxy to another.
There is also a technique called proxy bouncing or hopping in which the user keeps
on jumping from one proxy to another using an automated tool or custom script with
a list of proxies, in this way the user keeps on changing his/her identity after a short
period of time and hence makes it very difficult to be traced. This can also be imple¬
mented at server side.
Some scenarios in which people still get caught after using these tools/techniques:
• The user of a specific network (c.g., University) is known, and it is also known that which one
of them was connected to a specific proxy server/Tor around a specific time.
• Rogue entry and exit points. In an anonymous network like Tor if the entry point and the exit
point can correlate the data packet based on its size or some other signature, they can identify
who the real sender might be.
• DNS leak. Sometimes even when we are connected to an anonymous network our machines
might send out the DNS requests to the default DNS server instead of the DNS server of the
anonymous network. It means that the default DNS server now may have a log that this specific
address resolution was requested by this IP at this point of time.
• Leaked personal information. Sometimes people who are anonymous to the internet leak some
information which can be used to directly link it to them such as phone numbers, same forum
handles which is used by them when they are not anonymous, unique ids etc.
• Metadata. As discussed in the last chapter there is so much hidden data in the files that we use
and it might also be used to track down a person.
• Hacking. There can be security holes in any IT product which can be abused to identify the real
identity of the people using it.
• Basic correlation. As shown in the first scenario, correlation can be used to pinpoint someone
based on various factors such as timing, location, restricted usage, and other factors.
Some of the suggestions/warnings for using Tor are listed at https://www.tor
project.org/download/download-easy.html.en#\varning. These should be followed
with every tool/technique discussed above, where applicable. Also use a separate
browser for anonymous usage only and do not install addons/plugins which are not
necessary.
So we learned about various ways to stay anonymous online, but as stated earlier
100% anonymity cannot be guaranteed online. What we can do is to try to leak as
little information about ourselves as possible. The methods discussed in this chap¬
ter are some of the most popular and effective ways to do this. Online anonymity
can have various use cases such as privacy, protest, accessing what is restricted by
the authority, business related, law enforcement, journalism but it can also be used
by people to perform illegal activities like malicious hacking, illegal online trade.
Copyrighted material
PA 169
CHAPTER
Deepweb: Exploring the
Darkest Corners of the
Internet
INFORMATION IN THIS CHAPTER
• Clearweb
• Darkweb
• Deepweb
• Why to use it
• Why not to use it
• Deepweb: Tor, I2P, Freenet
INTRODUCTION
In this chapter we will start from where we left in the previous one. We learned about
various tools and techniques related to how to stay anonymous online and also dis¬
cussed about some of the ways in which people still get caught. Here we will deal
with the terms like darknet and deepweb and understand some of the fundamental
differences.
One of the most efficient ways discussed to stay anonymous was connecting
to the anonymous networks like Tor and I2P. We will take this topic further and
see what else we can do with it and how it relates to the topic of interest for this
chapter.
Until recent past terms like darknet and deepweb were not too popular. They were
mostly a topic of interest for people who want to stay anonymous and related to IT
(especially information security). Recently there has been some news stories related
to these topics, which have made people interested in them and understanding what
they are, how they operate, what to expect there, etc. We will cover all those things
here and see if there is anything of interest for us.
Before going any further with the technical details, let’s understand the basic
definitions of the terms we will be dealing with in this chapter
CLEARWEB
We have already discussed in previous chapters about how the search engines work.
Simply stated, it works by following the links on a web page and then on the next one
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-8ni8fi7-5.ft0009-4 169
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
170 CHAPTER 9 Deepweb
and so on. So the part of the web which can be accessed by a search engine is called
clearweb. What this means is that anything that we get as a result of a search engine
query is part of the clearweb.
DARKWEB
As a user we have clicked on different links on a webpage, but that is not the only
way we interact with a website. Sometimes we have to submit some text to get the
desired page (e.g., search box), sometimes we have to authenticate before accessing
a specific page (e.g., social network website login), sometimes there are things like
CAPTCHA which need to be entered before moving further.
So apart from the web that is accessed by search engines there is still a huge
amount of data that exists in pages not touched by web spiders/crawlers. This part of
the web is known as darkweb or darknet.
DEEPWEB
Now we have a clear separation of the web into two parts, clearweb and darkweb,
based upon their accessibility to the search engine. Now we will move a little
deeper.
The darkweb comprises of a huge part of the overall web. Inside this darkweb
there exists another section which is called as deepweb. This deepweb is also not
accessible to the search engines but it also cannot be accessed directly by standard
browsers we daily use. This portion of the web is hidden deep inside the web and
requires special applications and configurations to be accessed and hence is called
deepweb.
Now we have a clear understanding of what is darkweb and deepweb. We are
well aware of how to access the regular darkweb and do it on a regular basis. Pages
like social media profile which require login, search result page in a website, pages
generated dynamically are some of the examples. However if we need to access the
deepweb, we need to make special arrangements. Before getting into those details
let’s understand a bit more about the deepweb.
As stated earlier deepweb is a part of darkweb. Now the question arises that
how come it exists inside darkweb but is still not directly accessible. The answer
is that because it exists in the form of a network inside the internet, which in
itself is a huge network, which means is that darkweb is created as a part of the
internet but to access this specific network we need to have the right tools so that
a connection could be made to it. Once we have the means to connect to it we
can access it.
In this fancy land of deepweb we can find all sorts of things like illegal drugs,
weapons, art, and all sorts of black market things. On the other hand it is also used by
people to speak freely, exchange ideas, etc.
Copyrighted material
PA171
Darknet services
WHY TO USE IT?
If we are whistleblower, cyber investigator, cyber journalist, government intelligence
agent, cyberspace researcher then this is the place for us. This will help us understand
how the underground cyberspace works. It will give us ideas about the private
days, targets, and attack pattern of cyber-crime, etc. It will help us predict the next
attack pattern by understanding the underground community mind-set through the
technology they use most frequently.
It also provides freedom of speech, so if you want to protest for a good cause this
is the place for you. For investigation of a cyber-crime this can be a popular place.
As most of the underground community works here there is a chance of getting
ample amount of proof from this place. This can be also used to keep track of online
activities of a person or group.
There are dedicated services for optimized use of deepweb such as secure file
uploading facilities where activists or whistleblowers can anonymously upload
documents. There are services related to people spreading a word that other should
know, sharing what’s happening all around them, etc. There are online forums to
discuss technology, politics, and much more; so if we have these kind of specific
requirements or similar then we can use deepweb.
WHY NOT TO USE IT?
Apart from utilizing this space for ethical motives some people also use it to perform
many illegal activities. There are many places in this area where we can find people
selling drugs, fake ids, money laundering, hackers for hire, etc. Some websites even
say that they provide assassins for hire. Apart from this it might also contain websites
which provide many disturbing things. One must be very careful while accessing or
downloading any content from such places at it might be illegal to access or have it
on our computers.
DARKNET SERVICES
TOR
One of the most popular portion of the deepweb is the *.onion domains. In the last
chapter we learned about Tor, how it works and also how to use to stay anonymous.
The same Tor also allows us to create and access one of the largest portions of the
deepweb. We arc already aware about how to use Tor browser bundle to access the
regular web, now that same tool can be used to access places which are not directly
touched.
171
Copyrighted material
172 CHAPTER 9 Deepweb
Wc simply need to download the Tor browser bundle, extract it, and run the Tor
browser. Once the connection to the Tor network is made we are good to go. Apart
from accessing the regular websites Tor allows to create and access *.onion websites.
These websites if tried to access through a regular browser without Tor configured,
will simply display a “The Webpage is not available” message, some kind of error
or redirect message; whereas will open up like a regular website through the Tor
browser or a browser configured to access the internet through Tor as a proxy.
Let’s start exploring these Tor-based domains. One of the most common
places to start with is “The Hidden Wiki.” The address of this wiki is
http://zqktlwi4fecvo6ri.onion/wiki/index.php/Main_Page. Notice that this URL
does not contain a .com, .net, .org, or other familiar domain names, but is .onion.
Firstly try to open this URL into a regular browser, docs it open up? Now open this
URL into our Tor browser. We will get a webpage which contains a wiki page with
a huge list of other .onion domains divided category wise. The categories listed
are financial services, anonymity and security, whistleblowing, P2P file sharing,
etc. We can explore this wiki further and check out some of the interesting links
listed in it.
FIGURE 9.1
The Hidden Wiki.
Similarly there is another wiki. Tor Wiki’ which lists a huge list of .onion domains.
It also contains various categories in a neater way. This wiki makes it easier to explore
the listed domains by marking them as verified, caution, or scam.
Copyrighted material
PA173
Darknet services
FIGURE 9.2
TOR Wiki.
The search engine DuckDuckGo that we discussed in a previous chapter, also
has a .onion address, http://3g2upl4pq6kufc4m.onion/. Using this we can search the
clearweb from a Tor domain.
FIGURE 9.3
DuckDuckGo Search Engine (.onion domain).
173
Copyrighted material
174 CHAPTER 9 Deepweb
There are also some search engines such as TORCH http://xmh57jrzrnw6insl.
onion/ available to search the deepweb, but they seldom work properly.
As we can see in the wikis list there are various market places which sell illegal
drugs. One of the most popular one was called as “Silk Road’” which was recently
brought down by FBI, but a new one has come up to take its place and is called “Silk
Road 2.0.” Similarly there are many other places which claim to have illegal items,
as well as various forums, boards, internet relay chats (IRCs) and other places which
provide like-minded people a platform to discuss and learn things. One such board is
Torchan http://zw3crggtadila2sg.omon/imageboard/. There are various different topics
such as programming, literature, privacy etc., on which people discuss their views.
FIGURE 9.4
TorChan.
Till now we have seen how to access .onion domain websites, now let’s see how to
create these. To create a .onion site first we need to have a local web server. XAMPP
is one such option which uses Apache as a server. Once the server is installed and
configured to host a local website, we need to modify the “torre” file. This file can
be found at the location “Tor Browser\Data\Tor”. Open this file in an editor and add
the following lines to it:
HiddcnServiceDir C:\Tor\Tor_Browser\hid
HiddenServicePort 80 127.0.0.1:80
The path in front of “HiddcnServiceDir” is the path where Tor will create files to
store information related to the hidden service we are creating. The part in front of
Copyrighted material
176 CHAPTER 9 Deepweb
Wc have seen how to create a Tor hidden service, but for it to be safe and
anonymous we need to take various steps as followed:
• Configure the server to not leak any information (e.g., Server Banner, error
messages).
• Do not run any service on that machine which might make it vulnerable to any
attack, or might reveal the identity.
• Check the security of the web application hosted.
Tor also allows us to run hidden service through relays but it is not advised.
Relays are nodes which take part in transferring the traffic of the tor network and
act as routers for it. Relays are of different kinds: middle relays—which are start¬
ing and middle nodes in the packet transfer chain; exit relays—which are the final
node in the chain and connect directly to the receiver; bridges—which are the
relays that are not publicly listed as tor relays. Bridges are helpful when we are
connecting to the internet through a monitored/managed network (e.g., college net¬
work) as it would make it difficult to identify if the user is connected to Tor using that
network. These applications to run these services can be downloaded from the page
https://www.torproject.org/download/download.html.en
I2P
Like Tor we also learned how to be anonymous using I2P. Now in this chapter we
will not focus on the anonymity part again but will focus on how I2P will help us to
access/create deepweb.
Though wc will find number of places where wc will get lots of market
places of hidden services related to I2P or can be accessible by I2P and in most
places sites will claim the authenticity of the services provided, it’s better to
cross check manually before using or accessing any of them to avoid unknown
consequences.
We already know how to install I2P, as we learned the same in the last chapter
but for a quick reference we can easily download and install it from the following
URL: https://geti2p.net/en/download (here we can get bundle for Windows, Mac,
different Linux version, and also for android). Download the bundle according
to your device and operating system and install. After installation once you open
I2P, it will open in localhost (http://127.0.0.1:7657/home) or else as we learned
in last chapter we need to manually type this web address in the address bars of
the browser. After opening the same in browser once we get Network OK in left
top corner of the page, configure the browser proxy settings to 127.0.0.1:4444 to
access all the sites. And for IRC we can use localhost:6668 in our IRC client and
can use #i2p for chat. After changing the browser proxy setting we will able to visit
the eepsite sites with *.i2p extension.
Copyrighted material
PA177
Darknet services
£■; (2? Router Console • *>cme - Wndow, Internet Explorer „ m
f a )
*1. fi'
■Jf Favorites -jfc | Suggested Sites w Web Shce 0diary *
<& OF Reuter Console • home
’•I - 0 ' ’ <m - Page- Safety- Took- ”
Shutdown
LOCAL DESTINATIONS
While you are wWmg. please adjust your bandwidth settings on die configuration page
Also you can setup your browser Co use the I2P proxy to reach eepsitcs Just enter 127.0.0.1 (or localhostj p
use SOCKS for this Mare information ran be found on the I7P browser proxy setup page
Once you have a 'shared clients' destination listed on the left, please check out our FAQ
Point your IRC diem to localhost.6668 and say hi to us on #i2p.
rt 4404 as a http proxy into your browser settings. Do not
WELCOME TO 12P
i n=u
• ::=aBBUa«aa a
FEPSITES OF INTFREST
o ©
51
*>
9 & » © it a
E B Q r 9 fr o
javadocs jisko i2p killyourtv i2p pastebtn planet I2p plugins postman's tracker protect w
free web hosting htddengale i?p
M
stats I2p technical
S E
/
/
t.ar wiki
ugha s wiki
LOCAL SERVICES
H
Hi
/
s
Q K
*>
addrrssbook configure bandwidth
conFsgure language
customize home page
email help router console
torrents website
C ' (El q
0 Internet | Protected Mode: On
FIGURE 9.7
I2P home.
Some of the sites are listed in the router homepage as shown in the figure.
E.g., Anonymous git hosting: http://git.repo.i2p/
Here though we need to provide some details to push the respiratory, the identity
is provided by us will not link to our real IP address, in this way we can use git
anonymously.
FIGURE 9.8
I2P git.
Free web hosting: http://open4you.i2p/
177
Copyrighted material
178 CHAPTER 9 Deepweb
Here we can get details of how to use the free web hosting service. There are
other details that can be found in the forum maintained in the following URL:
http://open4you.i2p/index.php
If we want to host any kind of website in the deepweb, this can be helpful.
Pastebin: http://pastethis.i2p/
It is a website generally to save text online for a period of time for personal use.
But popularly it is used as a source to provide credentials, latest cyber news, deface
site details, cyber-attack target details, etc. Though in the normal pastebin we need to
provide certain details to paste something, here no details required.
We can also find all the paste details from the following URL: http://
pastethis.i2p/all/.
|IL htt P : ’ pastethis.i2p all
- &
Favorites | ^ | Suggested Sites * Web Slice Gallery ▼
55 ” 0 I2P Router Console - home 0 All Pastes | Lodgeft! X ?; forum.i2p - Index
0 id3nt.i2p
{
pastethis.i2p
All Pastes
Paste #7787 . pasted on Oct 27, 2014, 2:44:32 PM
cest test tset tset 111222333
Paste #7786. pasted on Oct 27, 2014, 1:33:07 PM
cjiaBa yxpaKKe
Paste #7783. pasted on Oct 26, 2014, 9:19:48 PM
arstarstrast
Paste #7782 . pasted on Oct 26, 2014, 4:21:27 PM
Hello, world.
Paste #7781. pasted on Oct 26, 2014, 4:03:08 PM
Ilojibsyacb cjiynaeM nepeaajo npuseT xoxJiymxaM
Paste #7780 . pasted on Oct 25, 2014, 8:58:17 AM
Paste #7779. pasted on Oct 25. 2014. 5:44:24 AM
FIGURE 9.9
I2P based Paste data service.
Forum: http://forum.i2p/
It’s like a general forum to discuss different things in different section of threads.
The topics may be related to I2P or something else. Depending upon the area of
interest take membership, login and read, create or edit posts based on the permissions
provided by the site.
Copyrighted material
PA179
Darknet services
Mx
Favorites ■jjg | Suggested Sites ▼ Web Slice Gallery w
S£ ' $ I2P Router Console home *J forum.i2p - View Foru... > 0 id3nti2p
i
a ~ a'
I2P:!:"
Jump to 1 Select a forum
V Go|
FAQ Search Memberiist Usergroups Profile Log in to check your private messages
Log in Register
Topics
Replies Author
Views
Last Post
Discussion
_A Announcement! Scammers on I2P - DO NOT PAY
= ANYTHING!
2
echelon
773
Mon Jun 02. 2014 6:44 pm
Guest 4Q
pd*i Announcement; New B6CODE
0
carve ntes
6710
Wed Jun 30. 2004 2:35 pm
Cervantes #0
r Sticky; Post you I2P Messenger (QT) Oest keys here _
[ DC-oto page: 1. 2 )
CansorshipSucks
6789
Tue Mar 18. 2014 1:44 am
Selma 4Q
j Sticky: I2P Use Cese Survey
» t DGoto page! 1. 2, 2 ]
41
Cervantes
34210
Thu Mar 06. 2014 6:02 pm
Guest *0
Jr Sticky: [ Poll ] Java Version
™ [ D Goto page; 1-2 3
16
welterde
9317
Wed Jan 16. 2013 4i25 am
Guest 4Q
Eepsites that must upgrade by February 2014
0
HI
43
Thu Oct 23. 2014 7:29 pm
zzz 4D
howto make an anonymous chat?
2
fchristianf am
24
Thu Oct 23. 2014 12:05 pm
hkoominabird ■*0
Union Brotharhood fans of science
1
flues
23
Tue Oct 21. 2014 1:07 pm
Guest NQ
lookup in the netOb to find Bob's leaseSet
3
Guest_Guest
42
Fri Oct 17. 2014 3 1 S3 pm
Guest 4Q
Why are participates allowed to know If they are a
gateway?
2
Guest
48
Thu Oct 16. 2014 9:07 pm
Guest *0
Does anyon have the math for sharing more then
they take?
0
Guest
32
Sun Oct 12. 2014 2:08 am
Guest
Can i find a hacker here?
1
Soleke
70
Fri Oct 10. 2014 4:13 pm
Guest <40
Social Network I2P
3
tochans
272
Mon Sep 29. 2014 8:37 pm
Guest 40
Why NOT a Social Network like TORBOOK?
3
kakadepajato
242
Mon Sep 29, 2014 8:34 pm
Guest *0
Thu Sep 25. 2014 9:41 am
FIGURE 9.10
I2P based forum.
Microblog: http://id3nt.i2p/
Id3nt is a microblogging site like twitter. Here
we can post whatever we want, we
can share our views, discuss on a particular topic,
reply to some post of our interest.
It’s quite similar to the normal microblogging site
vj/ 0 idiot i2p
■ Sing
P
lit Favorites 'je ■ Suggested Sites v fi Web Slice Gtlleiy v
it * <5 CP Route* Console - home © id3n*.i2p *
a - 0
* ta #
* Pege» Safety - Tools» ©*
/ teach older lEs to render those elements ai
all <hnp /,'dw*ertohtml5 org semantics htmHFunkflown-efemencs)
oxdsnt
I DENTISTS
ABOUT RSS LOG IN B
HOLY SHIT(LIST)
Freshest Dents
G> NightEyes M9 27/10/14 n r«pN '■» «*S6 | 0 v
<* Accidents
♦bgt».« Of course the regi4ar wit if boring
< SUUHSMU
Obigtex W*l 27/10/14 H>«pk toa»S I 0 v
1020 users, 41II
dents Total voces:
6967/279-
#f*ght£yes bn new to due dark web stuff, regular net is boring
LATEST HASHTAGS
”
free net proxy
€9 NightEyes 254 27/10/14 n .«* »4*m 11
otes
IDENTNEWS V
4bigtex it is but needs more actively It can have alloc of potential
people vied it
O bigeex 2H7 27/10/14 | 0
•f The avataro uploading
bug has been fixed ot was
a dependency issue; (U.
didn't conpSt in zllb
when n was instaatd). if
just oot here, this is interest!no
resolved, please send
feedback
Ol2r0n 2:22 27/10/14 | 0 v
otesl*?
e! For aS you ndtnthts
that want an afternatwe to
IWiv? for the glory of Evolution, of course. kS*g47jdwd3wdv*2.omon
100* more choices' Try
O sun.friis
ores:*?
theme from tz:o on
your profile
Other recent updates
FIGURE 9.11
Id3nt.
179
Copyrighted material
180 CHAPTER 9 Deepweb
How to create own site using I2P:
To create our own anonymous I2P web server we need to edit files from the follow¬
ing path. In case of windows machine the path is % APPDATA%\I2P\eepsite\docroot\
and in case of Linux machine the path is ~/.i2p/eepsite/docroot/.
C:\Users\(o.o)\AppData\Roaming\I2P\eepsite\docrootj
Organize ▼ Include in library ▼ Share with ▼
New folder
'it Favorites
Name
Date modified
Type
Size
E Desktop
ik help
10/27/201411:44...
File folder
i Downloads
j| favicon
2/16/2006 5:30 AM
Icon
2 KB
Dropbox
index
10/28/2014 1:59 AM
Firefox HTML Doc...
1 KB
% Recent Places
Q robots
2/16/2006 5:30 AM
Text Document
3 KB
FIGURE 9.12
Eepsite files.
C:\U«rsMo.o)\AppOata\Roaming\I2l
.html - Notepad*
File Edit Search View Encoding Language Settings Macro Run Plugins Window ?
oHu o «ar* mrn \ 9 c \ m ^\ * * £ gbsbbibibb
[Jrvdex html q|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
l<html>
l<! —
#
# If you have a 'split' directory installation, with configuration
# files in ~/.i2p (Linux) or %APPDATA%\I2P (Windows), be sure to
# edit the file in the configuration directory, NOT the install directory.
#
—>
<head>
<! —
* Remove the following three lines to stop redirecting to the help page.
* If it continues to redirect:
* 1) Make sure you edited the correct file (see above)
* 2) Clear your browser's cache.
■ — H
<title>I2P Anonymous Webse rver< / title>
-</head>
^<body>
<p>This is a test site</p>
</body>
L </html>
FIGURE 9.13
Edit file.
After completing all the edits we need to set up server configuration details from
the following URL: http://l 27.0.0.1:7657/i2ptunncl/edit.jsp?tunnel=3
The options are shown in below figure
Copyrighted material
PA181
Darknet services
ST *^^T«p^»jOJOa!^5^!2p<^o«redIt^^u^oel=^™ f*
fa f front tt fa | Su^etted Srt« * f .\>h Slice Gi*er,' .
• i • L>P Ano<v,moui Webimt.. ‘.i CP Tuiwel Mincer - E ... * *i ' □ ’ * W * p »9« ' bitty* locH* O’
FIGURE 9.14
Edit server settings.
By default the site created can be accessible locally from the following path
http://127.0-0.1:7658.
©O’
H http: 127.0.0.1 7653
^ Favorites
^ |l> Suggested Sites
▼ gj Web Slice Gallery ▼
QQ
QO
▼
* • I2P Anonymous Webserve...
* • I2P Tunnel Manager - Edit..
This is a test site
FIGURE 9.15
Local site.
Though we can edit the same from the server settings, additionally we can use
setup name, description, protocol, IP and port number, as well as the domain name
from the above server edit page. There are certain advance options. But the options
are quite straightforward so we can easily configure our web server and anyone can
access the same using the provided domain name.
Once completing all the configurations save the details. We will get the page
where we need to start the web services we just configured shown in below figure.
181
Copyrighted material
PA 182
©CM
H http: 1274)i)l:7^S7 r i2ptunne ;t?nonce- 73346642315557&JS34fitaction-startflctunnel-J
llj ht xflHBmg
•jit Favorites
■ Suggested Sites * £] Web Slice Gallery »
' I ? 12P Anonymous Webieive... '. i 12P Tunnel Manager - L-. *•$ I2P Anonymous Webserver
0 ’Q
/ ■" V- ..;■ . --™ " * " ' “ — I " "" .-""'T' "
Reft.. h
FIGURE 9.16
Starting the service.
Sometime we need to add the domain name and a long base64 key generated by
the page in the router address book to access the site as shown in below image.
GKJ’ M http: 127j0j01:7657/i2ptunnel/edit?tunnel=3 » Mftixiii
Vf Favorite* | Suggested Sites w ^ Web Slice Gallery ▼
So * H I2P Anonymous Webserve.
.. *•; I2P Tunnel Manager - E... X
H I2P Anonymous Webserver
Iri I2P Anonymous Webserver Of ~ ”
| Edit server settings
Name (tD
| I2P Webserver test
Type
HTTP server
Description fg) :
|My eepsite testing
Auto Stsrt(A).
Q (Check the Bos lor -YES')
target. Host(H):
Port(£Use SSL?
1127.0.0.1
76S8j
Website name(W): |
|devi1s?one.i?p
(leave blank for out proxies)
Private key fllefk):
| eepsite/eepPriv.dat
Local destination(l): |
jgO 0 6HW02 3 ELTAC4 ZObl- 1 mY , -
-ckMbPAMFha-oOuaXLXyLaFCqzYOOvFZNDYS ^J|
FIGURE 9.17
Adding the name.
Now we can access the page by the domain name. In my case as from the above
figure it’s quite clear that the name is http://devilszonc.i2p/.
Here is the figure showing the same using the domain name in the browser.
Copyrighted material
PA183
I H http devilszone.i2p
* Favorites | ^ | Suggested Sites ▼ ^ Web Slice Gallery ▼
1381 * I2P Anonymous Webserve... I2P Tunnel Manager - Edit
I2P Anonymous Webserver 1*5 I2P Anonymous Webse...
c
FIGURE 9.18
Service running.
Here we learned how to get different internal sites from the internet with *.i2p
extension, how to access them using I2P, how to create our own I2P site for providing
services. This will help us to understand the deepweb quite easily.
FREENET
Similar to Tor and I2P there is yet another anonymous network, freenet. It is one
of the oldest networks around and is known for P2P file sharing capabilities. The
applications can be downloaded from https://freenetproject.org/download.html.
Once downloaded, simply install the application and run it.
Freenet will open up a browser once it is run. The webpage displayed will provide us
a series of choices to determine the security and data usage limit we desire to have and
then perform the setup accordingly. Once this setup is complete, we will be presented
with the freenet homepage. This page contains links to indexes of freenet websites (called
freesites) similar to the Tor wikis and documentation related to other associated softwares
and HOW TO guides. In the homepage there is also a search box which allows to search
through freesites. Using certain plugins such as freetalk and freemail we can also use
freenet to have communication over freenet.
File Edit View Hntwy Boolcmsrics Took Help
A
Freenet. Freenet
4 ? A loeelhott 1 "
P, Most ViMted Getting Started
- C (9 a - Gco?,r
P4 '*130 - =
My bookmarks lEditl
Freenet
Browsing Fiteshanng Fnends Discussion Status Configuration KevUtits
Directories of websites on Freenet
Enzo's Index (Links to most Freenet web sites, sorted by category to make it easier to find what you want)
Lmkaoeddon (Links to every Freenet website, sorted by when they were last updated, mcludwig some very offensive sites Be careful what you click on!)
Nerdaqeddon (Sendar to Linkageddon but with the most offensive content removed)
Freenet related software and documentation
Freenet Social Networking Guide (Step by step guide to how to set up anonymous email, forums, chat, social tools etc on Freenet Strongly recommended!)
Freemail (Email over Freenet)
Publisht (How to publish web sites on freenet)
Freesite HOWTO (Another guide to publishing a website to Freenet)
Sone (Social chat over Freenet)
iSite (Essential tool for uploading websites to Freenet)
Freenet Documentation Wiki (Freenet documentation wiki)
Freenet team’s blogs
Toad (Chief Freenet developer's personal blog)
_English Switch to advanced mode Securitylevete LOW LOW f *1 10 —^
FIGURE 9.19
Freenet homepage.
Copyrighted material
PA 184
184 CHAPTER 9 Deepweb
Enzo’s index is one such index which lists many freesites and has divided them
under categories. Another such list is Linkageddon.
File Edit View Hijtoiy gookmarlcj Tools Help f
Enio'i Index
C ® 0 Google
& localhost
£ Mott Visited LJ Getting Started
Schneier on Security: The US Intelligence Community has a lhi...
Pages: 1 - Links: 0 - Updated: 2014-08-08 New
a Politics
P2P Papers
A colection of technical P2P documents
Pages: 1 - Links: 0 - Updated: 2014-08-07 New
ill - Infos
New leaker disclosing US secrets, government concludes - CNN.com
The federal government has concluded there's a new leaker exposmg national
secunty documents in the aftermath of surveillance disclosures by former NSA
contractor Edward Snowden, U.S. officials tel CNN.
Pages: 1 ■ Links: 0 - Updated: 2014-08-07 New
- Politics
Index - junkfood palace
randomness at its best! (
Pages: 36 - Links: 1 - Updated: 2014-08-06 ^ 23.33 % (7)
Download one web page from a website with all its prerequisite
*s: 1 • Links: 0 • Updated: 2014-08-04 New
P* *☆£**!>• -foO - =
1 logs - Adult (9) *
I logs - fHS flavour (400)
Freenet - Dev (38)
Freenet - Filesystem (2)
Freenet - FileTransfer (13)
Freenet • Help (35)
Freenet- Indexes(36)
Freenet - Messaging (37)
rreenet - Other (12)
Freenet - Publication (23)
Freenet - Search indexes (2)
Freenet - Spiders (6)
Freenet - Stats (20)
Galleries (38)
Games (1)
Humour (24)
Infos (191)
LGBT (6)
Mirrors (36)
Movies/Video (62)
Music (55)
FIGURE 9.20
Freenet Enzo’s Index.
FIGURE 9.21
Freenet Linkageddon.
Copyrighted material
PA 185
Disclaimer
Freenct also allows us to connect with people whom we already know and are
using freenet under the URL http://localhost:8888/friends/. For this we simply
need to exchange a file called as noderefs with them and provide this file on the
mentioned page and simply click on the add button at the bottom. Under the URL
http://localhost:8888/downloads/ we can perform file sharing operations. Similar to
other networks discussed, freenet also allows to create and share our websites in
their network. Freenet maintains its own wiki https://wiki.freenetproject.org which
lists information related to different features of it and about how to perform different
operations including freesites setup.
Apart from these mentioned networks there are also some other networks which
provide similar functionalities, but Tor, I2P, and freenet are the most popular ones.
In this chapter we moved on from exploring the regular internet and learned about
some less explored regions of it. We discussed in detail about the deepweb, how to
access it, how to create it, and what to expect there. We also learned about its uses
and also how it is misused. We have also shared some associated resources which
would help to explore them further, but be warned you never know what you might
find there so act with your own discretion.
Till now we have learned about various tools, techniques, and sources of
information which might help us to utilize the internet in a better and efficient
way. Moving ahead we will learn about some tools and their utility in managing,
visualizing, and analyzing all the collected data so that we can better understand and
utilize the raw data to get actionable intelligence.
DISCLAIMER
The part of the internet that will be discussed in this chapter might also contain
illegal and/or disturbing things. Readers are advised to use their discretion and act
accordingly.
185
Copyrighted material
PA 186
This page intentionally left blank
Copyrighted material
PA 187
Data Management and
Visualization
10
INFORMATION IN THIS CHAPTER
• Data
• Information
• Intelligence
• Data management
• Data visualization
INTRODUCTION
Till now wc have learned about gathering data using different methods. Generally
people think that open source intelligence OSINT means collecting data from differ¬
ent internet-based open source options. But it’s not limited to that because if the data
we collected from different sources are not categorized properly or we cannot find
relations between one another it can be just a huge amount of random data that is of
no use. We will discuss later the need of managing data and analyzing its worth, but
for the time being let’s refresh what we learned so far and how to collect different
data using different sources.
From the very beginning we have focused on data extraction using different
methods. Wc started with search engines, where generally a normal user gets
answers for all the questions, and we also discussed how that is just a minute part
of the web as popular conventional search engines have only a limited amount of
internet indexed in their databases. So we learned how to use other specific search
engines to get specific data. Some popular features of mainstream search engines
that make them unique as compared with other. Further we learned about some
interesting tools and techniques to find out data which is openly available. Later
we moved on to power searching and learned how to get desired information from
the web effectively. Then we moved to metadata and how it can be helpful. We
learned how to get all the metadata information and how can we use it for different
purposes and last but not the least we covered Deep Web, the part of web which is
not directly indexed by conventional search engines. Wc learned how to access it
to get more information.
So for the time being we can say that we learned how to collect data from dif¬
ferent sources directly using some well-known solutions and also using some
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-8018fi7-5.ft0010-0 187
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
188 CHAPTER 10 Data Management and Visualization
unconventional tools that open even more doors to collect data. Before going any
further, let’s discuss a bit about what is data, information, and intelligence and how
they differ from one another.
DATA
“Data” is one of the most commonly used terms in any domain, especially IT. If we
describe data in simple words it means the RAW form of entities. It is the depiction
of facts in a basic form. For example, if we get a text file that consists of some kind
of names abc.inc, xyz.com, john, 28, info@xyz.com, CTO, etc. We can see there are
certain entities we found but have no meaning. This is the raw form. In its original
form, data do not have much worth.
INFORMATION
The systematic form of data is called information. When data is categorized based
on the characteristics it can be called as information. We can say that aggregated
and organized form of data is information. Hence to achieve information we need to
process data. Let’s take the same example abc.inc is a company name, xyz.com is a
domain, john is a username, 28 is age, info@xyz.com is an email address registered
with xyz.com, and CTO is a position.
INTELLIGENCE
When we relate different information based on their relations with one another
and derive a meaning out of that, then what we get is called intelligence. So we
analyze and interpret the information at hand according to the context to achieve
Intelligence. From same example we can derive that xyz.com and info@xyz.com
belong to same domain. It is quite possible as john is 28 year old and is the CTO
of abc.inc. These are primary predictions they may be false positives also, so we
need to validate the same later but for the time being the information that we have
looks like relative so we can conclude that. John who is 28 years old is the CTO
of abc.inc and the domain name of the same company is xyz.com and email id to
communicate is info@xyz.com.
To validate we will need to extract information from other sources and we might
get to know that the name of the CTO of abc.inc is someone named John and there
is a John who works at abc.inc whose email is info@xyz.com and similar informa¬
tion which might correlate to prove our theory right or wrong. Now let’s say we are
a salesperson and our job is to contact management of different companies, then the
validation of this information being right allows us to contact John and craft an email
depending upon his age group and other information about him that we might have
extracted.
The definition of intelligence may differ from different people. This is our own
definition based on our experience, but the bottom line is it’s about the usefulness of
Copyrighted material
Introduction 189
the information. Unlike data, which states raw facts, actionable Intelligence allows
us to take informed decisions.
As we discussed earlier also, data is the raw form which just contains the entities.
Entity means anything tangible or intangible. It may be name, place, character, or
anything. If it is just a data it is worthless for us. We do not know what it is about.
We can get lots of random data but for using that we must understand what that data
is all about. Let’s again take another example, we got 1000 random strings, but what
to do with that. But if we come to know that those are some usernames or passwords
then that 1000 random strings are worth a lot. We can use that as dictionary attack
or brute force etc.
It’s not the data that is always valuable, it’s the information about the data or the
information itself that is worth a lot.
Managing data is very important. Managed data can be quickly used to find rela¬
tionships. Let’s say we have got lots of data and we know that the data consists of
name, email id, mobile number etc. If we will not manage that systematically in rows
and columns, we will lose track of that and later when we need a particular set of
data, let’s say name, it will be difficult for us to differentiate and fetch from large
amount of unmanaged data.
So it’s always important to manage the data by its types in a categorized manner
so that later we can use the same quite easily. As seen in previous chapters there are
various sources of information. Every source is unique in its own way. When all the
data from different sources comes together it creates the complete picture and allows
us to look at it from a wider perspective.
As we covered Maltego earlier, we have seen how it collects data from differ¬
ent sources, but even then there arc many other sources. The thing to focus here
is that it’s not about running a tool and collecting all the data. It’s about running
transformations one by one to get desired final result. To extract data from dif¬
ferent sources, correlate it and interpret it according to our needs, it’s not pos¬
sible in most cases that we will be able to get all the data we want from a single
source. So we need to collect different data from multiple sources to complete
the picture.
For example, let’s take a condition that we need to collect all the data about a
person called John. How to do that? As John is quite common name, it is very dif¬
ficult to get all the information. Let’s start with some primary information. If we
can identify the picture of John then we might start with a simple Google search
to check the images, we might or might not get his picture, if we get the picture
visit the page from where Google fetched this picture to get to know more about
John but if not then simply try social networking sites like Facebook or Linkedln,
there is a chance that we can get the picture as well as the profile of John in one of
the social network sites or all. If we get the profile then we can get more further
information like his email id, company name, position, social status, current city,
permanent residence.
After getting those details we can use the email id to check what other places
it is used, such as any other sites, blogs, forums etc. There are different online
Copyrighted material
PA 193
Data management and analysis tools
Process
Flow Line
-►
FIGURE 10.3
Some commonly used flowchart symbols.
We discussed a bit about the methods which are usually used for data storage and/
or management. Now let’s move on to learn about something different than the usual
stuff and see what other great options are available out there which can help us with
our data management and analysis needs.
193
MALTEGO
Any open source assessment is not complete without the use of Maltego. It’s an integral
part of OSINT. We already discussed a lot about this tool earlier and discussed how to
Copyrighted material
Data management and analysis tools 197
MagicTree simply open it, add some network address or host address to the scope so
that MagicTree will able to build a data tree for the same. The advantage of storing
data in tree form is that if later we want to add some other data it will not affect the
tree, we just need to create a new tree. It stores the data in tabular or list form and
uses XPath expression to extract data. There are many report templates that can be
customized and used for report generation.
The only limiting feature of this tool is that it only supports import option for xml.
So we cannot add tools which generate text output. Although it is limitation but still
this tool is pretty helpful for workflow automation for data retrieval from any tool,
and also highly recommended for pentesters.
FIGURE 10.7
MagicTree interface.
KeepNote
As the name suggests KeepNote is a note taking application. It is a cross-platform
application which can be downloaded from http://keepnote.org/. Unlike traditional
tools for note making such as Notepad, KeepNote contains of various features which
make note taking more efficient and allows to include multiple media into it.
To start note taking using KeepNote we need to first create a new notebook from
the File option. Once a notebook has been created we can add new pages and sub-
pages into the notebook. Now in these pages we can keep our notes and keep them
categorized. We can simply write the text into the bottom right part of the interface.
Copyrighted material
200 CHAPTER 10 Data Management and Visualization
image, marker, summary, attachment, audio notes etc. The variety of data types
allowed by Xmind makes it very easy and effective to create a mind map which
can actually translate our ideas into a visual representation. We can create mind
maps to create diagrams for project management, planning, decision making etc.
FIGURE 10.10
Xmind sample template.
Though the free version of Xmind has some limitations as compared to the pro ver¬
sion, yet it provides ample ways to visualize our ideas in a creative and effective manner.
There are various models and methodologies which are used in different domains
for data analysis process. Some are generic and some only fit to certain industries.
Here we are giving a basic approach which applies generically and can be modified
according to the specific needs:
• Objective: Decide what is the question that needs to be answered.
• Identify sources: Identify and list down the sources which can provide data
related to our objective.
• Collection: Collect data using different methods from all the possible sources.
• Cleaning: From the data collected, anything that is irrelevant needs to be
removed and the gaps present need to be filled.
• Data organization: The cleaned data needs to be organized into a manner which
allows easy and fast access.
• Data modeling: Performing the modeling using different techniques such as
visualization, statistical analysis, and other data analysis methods.
• Putting in context: After we have performed the analysis of data we need to
interpret it according to the context and then take decision based on it.
Copyrighted material
Data management and analysis tools 201
Unlike other chapters where we focused on data gathering, here we focused
around data management and visualization. Collecting data is important but manag¬
ing it and representing it into a form which makes the process of analysis easy is
quite important. As we learned earlier in this chapter that raw data is not of much
use, and we need to organize it and analyze it to convert it into a form which is
actionable, the tools mentioned in this chapter assist in that process. Once we have
analyzed the data and put it in context, we will achieve intelligence which helps us
to take decisions.
Moving forward in the next chapter we will discuss about online security. Day
by day cyberspace is becoming more insecure. New malwares keep on surfacing
every now and then, attack techniques are advancing, scammers are developing new
techniques to trick people etc. With all this around there is so much that we need to
safeguard ourselves from. We will discuss about tools and techniques to shrink this
gap in our security and will learn how to minimize our risk.
Copyrighted material
PA203
Online Security
CHAPTER
INFORMATION IN THIS CHAPTER
• Security
• Online security
• Common online threats
• Identify threats
• Safety precautions
INTRODUCTION
In the previous chapters we have fiddled a lot with the internet. We have used a vari¬
ety of tools to access it in different forms, we have also been to some of the lesser
touched areas of it and also learned about how to stay anonymous while doing so. We
also learned about some tools which would help us during the analysis of all the data
we have collected from this massive information source. In this chapter we are going
to touch upon a topic which has become very relevant in today’s digital age and it is
online security. Internet is a great place where we can learn different things, share it
with others and much more. Today internet is available worldwide, and accessing it
is pretty easy. We have devices which allow us to stay connected even on the move.
Today we rely on the internet for our many needs such as shopping, paying our
bills, registering for an event, or simply stay social. Even our businesses rely on the
internet for their day to day operations. We as a user of internet use different plat¬
forms, click on various buttons, visit various links on a daily basis. For an average
user, it may seem pretty simple but involves huge amount of technical implementa¬
tion at the backend.
Similar to our physical world this virtual world also has security issues, it’s no
big news that daily people are becoming victim of cyber-crimes. There are a variety
of threats which we face in this digital world and sometimes we don’t even recognize
them. For example, we all get that spam message stating that we have won a huge
amount of money in some lottery and we need to share some details to receive it.
Although most of us simply ignore such messages, some people do respond and are
victimized. Similarly we have already seen how the information we share online might
reveal something about us that we don’t intend to. This kind of information in wrong
hands can be used against us. Recently there have been many cases which involved
hackers attacking an employee’s machine to gain access to the corporate data. The
Hacking Web Intelligence. bttp://dx.doi.org/l().1016/B978-0-12-8018fi7-5.ft0011-2 203
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
204 CHAPTER 11 Online Security
main reason behind the success of such attacks is the lack of security awareness among
the users. Though understanding the technical aspects of cyber security can become
a bit complex for a nontechnical person but understanding how some of the common
attacks work, learning how to identify them, and finally how to mitigate them is a
necessity for every user. One question that is mostly asked by people is that why would
someone try to hack us though we don’t have any sensitive/financial information on
our computers. Instead of answering this question right away let’s learn about some
of the attack methods and then discuss why someone would attack an average user.
We are in a world where we love to spend more time online than in social. The
reason may be anything starting from shopping, e-mails, social online hangout, chats,
messages, or online recharge or banking. We may use internet for our professional
use such as it may be part of our day to day job or for personal use to browse or surf.
Anyways the motive behind discussing is that internet is now an integral part of life,
and it’s quite difficult to avoid it.
Earlier we came across about only one aspect that is internet privacy: how to
maintain privacy using different online solutions, browser setting, or anonymity
applications. What about the other aspect that we missed. That is security. So we
need to also put some light on the security aspects of the internet.
When we say security aspect it’s not just about using secure applications or visit¬
ing secure sites, or having security implementation on our system such as updated
antivirus and firewall. In this case security also means about data security or to be
precise internet data security.
Securing not only the data of an organization will make it secure but also the users’
data that’s also quite important. So in case of data security we need to focus on both
organizational data as well as users’ data. For example, let’s say an organization imple¬
ments proper security mechanism to secure its data. All kinds of security softwares
starting from antivirus, firewall, intrusion detection system, intrusion prevention sys¬
tem, and all other security tools implemented or installed in a system but if the security
question of the HR’s (Human Resource) e-mail id is what is your favorite color and
the answer is pink then all these security implementations will go vain. So it’s both the
users’ data as well as organizational data that are important and we need to take care
of both.
As an organization we discussed a little bit on how the metadata can disclose
certain information and how DLP (data leakage/loss prevention) can be helpful to
secure all those. But from a user perspective it is also quite important not to share
details in public that can be used against us or our security, for simple example, do
not disclose information in public that can be used to know more about our way
of thinking or about our certain area of interests. Let’s say we disclose informa¬
tion that can be used to recover our password such as the answers related to any
security questions, e.g., who is our favorite teacher, what place we like the most,
what is mother’s maiden name etc. These are common security questions which we
generally find in different online applications that can be used as an additional veri¬
fication to recover passwords. If we will disclose these information in public, let’s
say in any social networking site, blog, or anywhere else, that can be a threat to
Copyrighted material
206 CHAPTER 11 Online Security
come from online where we try to access certain restricted sites such as adult sites, free
music, or software hosting sites etc. So as a user, verify the source before download¬
ing anything. There arc various classifications of malware, some of which are defined
below.
VIRUS
Virus or Vital Information Resources Under Seize is a term taken from the normal
virus that affects person and can be the reason for different diseases. Similarly the
computer virus is a malicious code that when executes in a system, infects the sys¬
tem and performs malicious activity like deleting data, memory corruption, adding
random temporal data etc. The only weakness of the virus is, it needs a trigger for
execution. If our system contains a malicious software that is affected by virus until
and unless we install that in our system there is nothing to fear. To avoid a virus infec¬
tion use genuine updated antivirus.
TROJAN
Trojan is quite interesting malware, it generally comes as a gift such as if we visit
restricted sites then we will get some advertisements such as we won an iPhone,
click here to apply and all, or in popular paid games as free, then once user is lured
to that and installs that after downloading then the application will create a backdoor
and provide all user actions to the attacker. So to spread a Trojan, if the attacker will
choose a popular demanding paid app, game, movie or song then the chances of get¬
ting more people are quite a lot.
Trojans are nonself-replicating but hide behind another program. It is recom¬
mended that do not install any paid thing that comes as free. You never know what is
hidden inside that application and also use antimalware in system for better safety.
RANSOMWARE
As the name suggests, it is quite interesting malware which after infecting the sys¬
tem blocks some popular and important resources of our computer system and then
demand ransom money to give back the access. Usually ransomwares use encryption
technologies to hold our data as captive. The recommendation will be the same as
mentioned above.
KEYLOGGER
Keylogger is a piece of malware that collects all the keystrokes and sends the same
to the attacker. So when user inserts any credential for any site, the credential can be
recorded and sent back to the attacker and that can be later used by the attacker for
account takeover. The recommendation for this is if you are typing credentials for
any transaction-related site or value-related to any critical information, always use
on-screen keyboard.
Copyrighted material
PA207
Online scams and frauds
PHISHING
It is one of the oldest and still popular attacks which are also used in many corporate
attacks. It is a simple attack where attacker tricks the user by sending a fake link that
contains a page that looks quite similar to the original site page that user needs to
log in. Once user will login in that site the credentials will be sent to the attacker and
user can be redirected to the genuine site. The major weakness in this attack is the site
address. If a user will verify the site address properly then there is very less chance of
getting a victim of phishing attack.
The information needed here is which site the target is having account on and
which site the target generally visits quite often. So that later attacker can create a
fake page of that and trick the user.
There are many new ways of phishing attack techniques available now. Some
are desktop phishing where the host entry of the victim’s system will be changed
such as it will add an entry on the host file with the sites’ original domain name with
the address where the fake page is installed. So when a user types the IP address or
domain name in the browser it will search for host entry. The host entry will redirect
and call the fake page server, and the fake page will be loaded instead of the real page.
Another such popular phishing attack is tabnabbing. In tabnabbing when user
opens a new tab the original page will be changed into fake page by URL redirection.
There are also other popular phishing attacks such as spear phishing.
ONLINE SCAMS AND FRAUDS
One of the most widely faced issues online is the spam mails and scams. Most email
user receives such mails on a daily basis. These mails usually attempt to trick users
into sending their personal information and ultimately skim their money. Sometimes
it is a huge lottery prize that we have won, or a relative in some foreign country who
left us huge amount of money.
Maxwell Tobo Nov 24 at 10:42 PM
Beloved Friend,
I am writing this mail to you with heavy tears In my eyes and great sorrow in my heart because my Doctor
told me that I will die in three months time. Base on this development I want to will my money which is
deposited in a security company. I am in search of a reliable person who will use the Money to build charity
organization for the saints and the person will take 20% of the total sum. While 80% of the money will go to
charity organization and helping the orphanage. I grew up as an Orphan and i don't have anybody/family
member after the missing of my adopted son with Malaysia Airlines Flight MH370. Meanwhile at this point I
do not have anyone to take care of my wealth The total money in question is $7 5million dollars I will
provide you with other information’s once you indicate your willingness
Please contact me on my personal email on: maxtobo555@gmail.com
Yours sincerely,
maxwell tobo
FIGURE 11.1
A sample spam mail.
207
Copyrighted material
208 CHAPTER 11 Online Security
Scammers also try to exploit human nature of kindness by writing stories that
someone is stuck on a foreign land and needs our help and other such incidents.
Sometimes attackers also pose as an authority figure asking for some critical infor¬
mation or as the e-mail service provider asking to reset the password. There are
various Ponzi schemes which are used by scammers with ultimate purpose of taking
away our hard earned cash.
HACKING ATTEMPTS
There are cases where we found that users with updated operating systems, antivirus,
and firewall also face some issues and being victim of the hacking attack. The reason
of those is certain popular application (laws that can be found in any operating sys¬
tem. Some such applications are Adobe Acrobat Reader or simply the web browsers.
These kind of applications are targeted widely which covers almost all the operating
systems and also widely used. So targeting these applications allows an attacker to
hack as many as users possible. They cither create browser plugins or addons that can
help user to complete a process or to automate a process and the same in the backend
can be used for malicious intention,i.e., collecting all the user’s actions performed
in the browser.
WEAK PASSWORD
Weak passwords always play a major role in any hack. For the ease of user, sometime
applications do not enforce password complexity and as a result of that users use
simple passwords such as password, password 123, Password® 123, 12345, god, own
mobile number etc. Weak password does not always mean length and the characters
used, it also means the guessability. Name® 12345, it looks quite complex password
but can be guessable. So do not use password related to name, place, or mobile num¬
ber. Weak passwords can be guessable or attacker can bruteforcc if the length of the
password is very small, so try to use random strings with special characters. Though
that can be hard to remember as a security point of view it’s quite secure.
Strong password is also needed to be stored properly. Let’s say, for example, 1
created a huge metal safe to store all my valuable things and put the key just on top of
that. It won’t provide security. It’s not just about the safe but also about the security
of the key. Similarly creating a very complex password won’t serve the purpose if we
write it and paste it on our desk which also should be kept safe.
SHOULDER SURFING
Shoulder surfing is always a challenge with a known attacker, a person whom you
know and you work with. If he/she wants to hack your account then it is quite easy to
do it while you are typing the password. The only way to make it difficult is that type
Copyrighted material
PA209
Antivirus
some correct password characters then write some wrong characters then remove
the wrong characters and complete the password or else do not enter your password
when someone around.
SOCIAL ENGINEERING
The first thing comes to our mind when we read social engineering is “there is no
patch for human stupidity” or human is the weakest link in the security chain. This is
a kind of attack which is done against the trust of the user. In this attack, first attacker
wins the trust of the victim then collects all the information that is needed to execute
one of the attacks we discussed above or any other attack. The only way to prevent
from being a victim is trust no one, you never know when your boyfriend/girlfriend
will hack your account. Jokes apart, do not disclose any information that has a pos¬
sible significance with security to anyone.
So these were some of the security-related challenges that we face everyday, but
we have only covered the problems. Let’s move on to see what are the solutions.
ANTIVIRUS
As we discussed that there are various kinds of malwares out there and each one has
unique attack method and goal. There is a huge variety of these and most of the com¬
puter users have faced this problem at least some point of time.
Antiviruses are one of the security products which are widely used by organi¬
zations as well as individuals. An antivirus is basically a software package which
detects malwares present on our computer machines and tries to disinfect it. What
antiviruses have is a signature and heuristics for malwares, and based upon these
they identify the malicious code which could cause any digital harm. As the new
malwares arc identified, new signatures and heuristics are created and updated into
the software to maintain the security from the new threat.
Many antiviruses have been infamous for slowing down the system and making
it difficult to use, also the frequent updates have also annoyed people a lot. Recently
antiviruses have also evolved and become less annoying and more efficient. Many
solutions also provide additional features such as spam control and other online secu¬
rity solutions along with antivirus. The regular update is not just for the features but
also to keep the database updated to maintain security. There are various choices
in the market for antivirus solutions free as well as commercial, but it all comes
down to which one is the most updated one because new malwares keep on surfacing
everyday. One more thing that needs to be kept in mind is that there are also some
malwares posing as antivirus solutions and hence we need to be very careful when
making a choice for an antivirus solution and should download only from trusted
sources.
209
Copyrighted material
210 CHAPTER 11 Online Security
IDENTIFY PHISHING/SCAMS
We encounter a huge number of scam and phishing mails on daily basis. Today e-mail
services have evolved to automatically identify these and put them in the spam sec¬
tion, but still some of these manage to bypass. Here are some tips to identify these
online frauds:
• Poor language and grammar: Usually the body of such mails is written in poor
language and incorrect grammar.
• Incredibly long URL and strange domain: The URLs mentioned in such e-mails
or the URLs of the phishing page can be checked by simply hovering the mouse
over the link. Usually such URLs are very long and the actual domains are
strange. This is used to hide the original domain and show the domain of the
page that is being phished in the browser address bar.
• Poor arrangement of the page: The arrangement of the text and images is gener¬
ally poor as many attackers use tools to create such e-mails, also sometimes the
alignment changes because of the change in resolution.
• E-mail address: The original email should be checked to verify the sender.
• Missing HTTPS: If the page is usually an HTTPS one and is missing this time
then this is an alarming sign.
• Request for personal/sensitive information: Usually no organization asks for
personal or sensitive information over e-mail. In case such email is received it is
better to verify by calling the organization before sending any such information.
• Suspicious attachments: Sometimes these kinds of e-mails also contain an attach¬
ment file in the name of form or document usually with strange extensions such as
xyz.doc.exe to hide the original file type. Unless trusted and verified, these attach¬
ments should not be opened. In case the attachment needs to be opened it should
be done in a controlled environment such as a virtual machine with no connection.
UPDATE OPERATING SYSTEM AND OTHER APPLICATIONS
One of the major methods used by attackers to gain access to our machines is to hack
through the applications running on the system. The operating system we use or the
applications running over it contain flaws in the form of vulnerabilities. Attackers
use exploit codes to attack specific vulnerabilities and get a connection to computer
systems. New vulnerabilities are discovered on regular basis and hence the risk keeps
on increasing. On the other hand patches for these vulnerabilities are also released
by the vendors. Keeping our machine’s softwares updated is an effective method to
minimize our risk of being attacked.
Almost all operating systems come with mechanisms which allow it to update
with the recent patches available. They also allow us to manually check for updates
and install if available. Apart from this other applications that we use such as mul¬
timedia players, document readers etc., also have patches and some of them arc
updated automatically while some need to be downloaded separately and installed.
Copyrighted material
PA211
Addons for security
Secunia PSI is a Windows-based application which helps us to identify outdated
software and is also capable of automating the process of updating it. It can simply
run in the background and identify the applications that need to be updated. User can
download the appropriate patch and also install it. In case it is unable to do so it noti¬
fies the user and also provides useful instructions.
ADDONS FOR SECURITY
Web browsers are one of the most widely used applications on any platform and also
the medium for most of the attacks. Let’s learn about some easy-to-use addons which
can help us to stay secure online.
WEB OF TRUST (WOT)
WOT is a service which reviews website reputation based upon crowdsourced
method. Based on the review of the crowd the addon allows us to know how it is
rated on the scale of trustworthiness and child safety. Similarly users can also rate a
website and hence contribute to make the web a safer place. Details and comments
about the website you are visiting can also be viewed which help users to make an
informed decision. Using the applications is pretty simple, visit the website and click
on the WOT addon in the browser bar and it will display the details related to it. The
addon is available at https://www.mywot.com/en/download for different browsers.
FIGURE 11.2
Web of trust (WOT) in action.
211
Copyrighted material
PA213
Password policy
FIGURE 11.3
Microsoft Baseline Security Analyzer scan result.
Similarly there is Linux Basic Security Audit (LBSA). This is a script which aims
at making the Linux-based systems more safe and secure, though the setting should
be applied depending upon the requirements and might not be suitable for all sce¬
narios. More details about it can be found at http://wiki.metawerx.net/wiki/LBSA.
Using such free and easy to use utilities we can certainly identify the gaps in our
security and take appropriate steps to patch them.
PASSWORD POLICY
As we use keys to maintain the authentication in real world similarly we use pass¬
words in the digital world. Passwords are combinations of characters from different
sets of alphabets, digits, special characters which we provide to access and prove that
we are the rightful owner of the specific data/service. Using passwords we access our
computers, our social profiles, and even bank accounts. Though passwords are of such
relevance most of us choose to have a weak password. The reason behind it is that as
humans we have a tendency to choose things which are easy to remember. Attacker
exploits this human weakness and try to access our valuable information through dif¬
ferent techniques. Without going into the technical details of such attacks some of them
213
Copyrighted material
PA215
PRECAUTIONS AGAINST SOCIAL ENGINEERING
One of the main techniques used by hackers to extract sensitive information from the vic¬
tims is social engineering. We as humans are naturally inclined to help others, answer to
authority, and reciprocate. Using these and other similar weaknesses (in context of secu¬
rity) of human nature, we are exploited by attacker to make us reveal something sensitive
or take an action which might not be in our favor. People simply pose as the tech, guy and
ask for the current password or tell that they are the CTO of the company speaking and
ask the receptionist to forward some details. To safeguard against such attacks, security
awareness is very important. People need to understand what information is sensitive in
nature. For example, it might seem that there is no harm in telling someone the browser
version we are using at the enteqorise but this information is very crucial for an attacker.
Also one may trust but must verify. People should ask for proof of identity and also cross-
verify it to check if the person is actually who he/she is saying he/she is. In case of doubt it
is better to ask someone higher in authority to make the decision than to simply do as told.
DATA ENCRYPTION
At the end the motive behind most of the attacks is to access data. One step to stop
this from happening is to use a disk encryption software. What it does is that it will
encrypt the specified files in our machine with a strong encryption method and make
it password protected. In case even if the machine is compromised it would make it
very difficult for the attacker to get the data. There are many such solutions available
which provide this functionality such as BitLocker, TrueCrypt. It is advised to check
if the software being used has no publicly known vulnerability in itself. Similarly it
is advised to store and send all sensitive online data in encrypted format.
Help protect your files and folders by encrypting your drives
BitLocker Drive Encryption helps prevent unauthorized access to any files stored on the drives shown
below. You are able to use the computer normally, but unauthorized users cannot read or use your files.
What should I know about BitLocker Drive Encryption before I turn it on?
BitLocker Drive Encryption - Hard Disk Drives
G Turn On BitLocker
5 ^* off
D:
Off
1j§P Turn On BitLocker
l/IIVC Ulkiypuvn IV \JV
UUI (E:)
Off
Turn On BitLocker
See also
^ TPM Administration
<9 Disk Management
Read our privacy statement
online
FIGURE 11.5
BitLocker Drive Encryption.
Copyrighted material
PA217
CHAPTER
Basics of Social Networks
Analysis
INFORMATION IN THIS CHAPTER
• Social network analysis
• Gephi
• Components
• Analysis
• Application of SNA
INTRODUCTION
In one of the recent chapters we have discussed about the importance of data man¬
agement and analysis. We also learned about certain tools which could be useful in
the process. In this chapter we will deal with an associated topic, which is social
network analysis (SNA). SNA is widely used in Information Science to learn vari¬
ous concepts. This is wide topic and has applications in many fields and in this
chapter we will attempt to cover the important aspects of the topic and the tools
required for it so that the readers can further utilize it depending according to per¬
sonal needs.
As the name suggests, social network analysis is basically the analysis of social
networks. The social network we are talking about is a structure which consists of
different social elements and the relationship between them. It contains nodes which
represent the entities and edges representing relationships. What this means is that,
using SNA we can measure and map the relationships between various entities, these
entities usually being people, computers, a collection of them, or other associated
terms. SNA utilizes visual representations of the network for the purpose of better
understanding it and implements mathematical theories to derive results. There are
various tools that can be used to perform SNA and we will deal with them as and
when required.
Let’s deal with some basic concepts.
NODES
Nodes are used to represent entities. Entities are an essential part of social network
as the whole analysis revolves around them. They are mostly depicted with a round
shape.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B9784M2-801867-5.ft0012-4 217
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
218 CHAPTER 12 Basics of Social Networks Analysis
EDGES
Edges are used to represent the relationships. Relationships are required to establish
how one node connects to another. This relationship is very significant as it helps to
perform various analyses such as how information will flow across the network etc.
The number of edges connected to a node defines its degree. If a node has three links
to other entities, it has degree 3.
NETWORK
The network is visually represented and contains nodes and edges. Different param¬
eters of nodes and edges such as size, color etc., may vary depending upon the analy¬
sis that needs to be performed.
Networks can be directed or undirected, which means that the edges might be
represented as simple lines or as directed arrows. This primarily depends upon the
relationships between the edges. For example, in a network of mutual connection
such as friends, can have an undirected network but for a network of relations such
as who likes whom can have directed network.
Now we have a basic idea of SNA. Let’s get familiar with one of the most utilized
tools for it.
GEPHI
Gcphi is a simple yet efficient tool used for the purpose of SNA. The tool can be
downloaded from http://gephi.github.io/ and the installation process is pretty
straightforward. Once installed, the tool is ready to be used. The interface is simple
and is divided into different sections. There are three tabs present at the top left cor¬
ner which allow working with the network in different manner. These three tabs are
Overview, Data Laboratory, and Preview.
OVERVIEW
The Overview tab provides the basic information about the network and displays
the network visualization. It is primarily divided into three sections which further
have subsections. The left-hand side panel consists of sections which allow parti¬
tioning and ranking of nodes and edges, performing different layouts for the net¬
work based on different algorithms. The middle section consists of the space where
the network is visualized and the tools to work with the visualization. The right-
hand sections contain information about the network such as number of nodes and
edges and operations such as calculating the degree, density, and other network
statistics.
Copyrighted material
PA219
FIGURE 12.1
Gephi Overview.
DATA LABORATORY
Under the Data Laboratory tab we can play with the data in its raw form. In this tab,
the entities and their relationships are displayed in the form of a spreadsheet. Here
we can add new nodes and edges, search for existing ones, import and export data,
and much more. We can also work on columns and delete them, copy them, duplicate
them etc. The data present can also be sorted depending upon different parameters by
simply clicking on the row names.
FIGURE 12.2
Gephi Data Laboratory.
Copyrighted material
220 CHAPTER 12 Basics of Social Networks Analysis
PREVIEW
In Preview tab we can change various settings related to the properties of the network
graph such as the thickness of the edges, color of the nodes, border width etc. This helps
us to set different values for different parameters so that we can make recognizable dis¬
tinction based on different properties of the graph. The settings can be made in the left-
hand panel and the changes are reflected in the rest of the section available for preview.
There are many other tools available for SNA, some of which are SocNetV (http://
socnctv.sourceforge.net/), NodeXL (http://nodcxl.codeplex.com/), EgoNct (http://source
forge.net/projects/egonet/) etc.
The term network is quite the same as we use in computers or any other aspect
such as math or physics. The terminologies to get the proper definition might change
in different areas of study but the bottom line is network which is the connection of
different entities with relationships. As we discussed a bit about the network earlier,
now it’s time to dig a bit on the same. To make a simpler approach to network we will
use term “NODE” for entities and term “EDGE” for the relationship.
To create a meaningful and easy to understand network or graphical representa¬
tion of a network, we must need to focus on certain areas such as highlight the widely
used and important nodes and edges, remove nodes with no data or edges, remove
redundant data, group similar nodes based on geographical location, community,
or anything that broadly relates those. These are the basic practices or points to be
remembered while creating a meaningful and easy to understand network.
The components of a network such as the edge and the node have certain attri¬
butes based on that we can create a network. Those attributes play a vital role in
understanding a network and its components better. Let’s start with a node.
As discussed earlier, node has a property called degree. Degree can be used for
calculating the likelihood of that node. It is nothing but the number of edges that are
connected to the node. Though it also matters is that whether the edges are directed
or undirected. Let’s say the number of directed edges toward a node X is 5 and the
directed edges away from X is 2. Then the degree of X is 7 because it’s the combina¬
tion of in-degrec (5) + out-degree (2).
NODE ATTRIBUTES
Every node in a network can have a range of attributes that can be used to distinguish
some properties of a node.
The attribute can be in binary form to explain in simple true/false, yes/no, online/
offline, or married/unmarried. This is one of the easy representations of an node
attribute where we have only choose one out of the two choices.
The attributes can be set categorical based on if options available are more than
two such as if we want to set an attribute to a node called as relationship then we
can use different category as an option to it, e.g., 1. Friend, 2. Family, 3. Colleague.
The attribute can also be set as continuous such as based on some of the informa¬
tion that cannot be same for every node. For example, date of birth, job position etc.
We can also use the same as attributes of a node to distinguish as node quite easily.
Copyrighted material
PA221
Edge attributes
EDGE ATTRIBUTES
DIRECTION
Based on direction, two major types of edges can be found.
1. Directed edges
2. Undirected edges
Directed edges
Directed edges are the edges with unidirectional relationship. The best example of a
directed edge is X -* Y. Here X is unidirectional related with Y. We can say that Y is
a child of X or X loves Y or any such one-sided relationships.
Undirected edges
It can be used for establishing mutual relationships, such as X <-► Y or X — Y.
The relationship can be anything likeX andY are friends or classmates or colleagues.
TYPE
It can be the type of relationship that put an edge in a group. Let’s say that there are
different nodes and edges but if some of the edges are similar by the type let say a
group then we can distinguish them quite easily. Type can be anything such as start¬
ing from friends, close friends, colleague, relative etc. And it has a significant role in
differentiating different edges.
WEIGHT
It can be the number of connection that two nodes can have. For example, if X 4 —*Y
shares more than one mutual/undirected or directed edge to each other then the
weight of that edge is that number. For example, X relates with Y in five ways then
the weight of that edge is 5. We can simply draw five edges between those two nodes
or we can draw a deeper edge between them to make it easy to understand that these
two nodes contain a higher weighted edge.
Weight can be also of two types:
1. Positive
2. Negative
Positive weight
It’s based on the likelihood of a relationship. For easy understanding let’s talk about
a politician. There are many people who like a particular politician. So the relation¬
ship they establish with the same will be the positive weight.
Negative weight
Similarly as we came across, the negativity or the hate or the unlikelihood can be also
a factor in a relationship. That can be measured by the negative weight.
221
Copyrighted material
222 CHAPTER 12 Basics of Social Networks Analysis
RANKING
Based on the priorities of the relationship established between two nodes, edges can
have different rankings. Such as X’s favorite subject is Math, X’s second favorite
subject is Physics. So to differentiate between these priorities ranking comes in to
existence for easy understanding in a network.
BETWEENNESS
There are certain scenarios where we can see that there are two different group of
nodes connected to each other by an edge. So those kind of edges perceive a unique
quality to combine two different groups or set of nodes and that can be called as
betweenness. There are many other attributes that can be found situational. For the
time being we can say that we have basic knowledge of network, its components, and
its attributes so that if in future we get a chance to create a network or understand a
given network then we can understand at least the basics of it properly.
The core basics of the network and about its components are covered above but
still we haven’t covered the main topic that is SNA.
As discussed earlier in the chapter, SNA is about mapping and measuring of rela¬
tionships between different entities. These entities can be people, groups, organiza¬
tions, systems, applications, and other connected entities. The nodes in the network
are usually people but it can be anything based on what network we are looking at,
while the links represent relationships or flows between the nodes. SNA provides
both mathematical as well as graphical analysis of relationships using which an ana¬
lyzer can deduce number of conclusions such as who is a hub in the network, how
different entities connected to each other, and why they connected to each other
with a proper logical and data-driven answer. The factors that come into act such as
degree, betweenness, and others are already covered.
FIGURE 12.3
A small sample network to understand different components.
Copyrighted material
Edge attributes 225
Gatekeeper/Boundary spanners
An entity who mediates or we can say controls the flow between one portion of the
network with another, earlier we named it different, we used boundary spanner for it.
These are just different keywords with same definitions or are the same. In our previ¬
ous example, “F9” and “F3” are the gatekeepers.
Bridge
It is the only edge which links/belongs to two or more groups. In the previous exam¬
ple, there are three bridges, (1) F9 F10, (2) F3 -> F4, (3) F3 -» F5.
Liaison
An entity which has links to two or more groups that would otherwise not be linked,
but is not a member of either group. In our previous example, it does not have any
such node that was in a position of Liaison.
Isolate
As the name suggests, isolate is an entity which has no links to other entities; gener¬
ally a linkless or edgeless node. In our previous example, we do not have any isolate
nodes.
So there are certain roles that are not present in our example, so here is a new
network that contains all the roles and highlighted properly for easy understanding.
FIGURE 12.4
Network highlighting different roles.
SNA can be helpful in many ways to understand the information flow, we can use
the same in variety of situations such as predict exit poll results from the verdict of
online users or identify how and to what extent an information will flow in a network
of friends, understand an organizational culture, or even find the loopholes in a process.
For a simpler example, to use it in generic scenario we can create a network of
Twitter users of a community or organization and see who is following whom and
Copyrighted material
226 CHAPTER 12 Basics of Social Networks Analysis
who is being followed. This would help us to understand who the key players are
in that structure and create the most influence. Similarly we can also understand,
who is more follower type and who arc leaders. In a network of professionals in an
organization, it can be used to identify the people who create a hierarchy and how
path would be better if one professional needs to connect to another one who is not
a direct connection.
Similarly it can be used to analyze a network of connected people to identify how
a communicable disease would spread in the network and which links need to be bro¬
ken before the whole network gets infected. Another example could be in a network
of market leaders of an industry to identify who is the hub in that network and needs
to be targeted to be influenced for a decision to be taken.
Most of the attributes and functions that we have discussed in the chapter can
be automatically calculated using Gephi, it also has many algorithms which can be
utilized to perform the layout, identify key elements, implement filters, and perform
various other operations. It can also be extended utilizing various plugins option
which is present under the tools button.
FIGURE 12.5
Sample network in Gephi with different values calculated.
SNA is used by various social network platforms and organizations which deal
with connection between people; similarly it has application in many domains which
depend on information science to flourish their market.
We learned something new in this chapter and can use the same in future for easy
understanding of any complex system by creating a simpler network for that. Here
Copyrighted material
PA228
This page intentionally left blank
Copyrighted material
PA229
Quick and Dirty Python
CHAPTER
INFORMATION IN THIS CHAPTER
• Introduction to programming
• Python intro
• Python components
• Examples and Samples
• Creating tools and transforms
INTRODUCTION
After covering many interesting topics related to utilizing different automated tools,
in this chapter we will be learning to create some. Sometimes there is a need to
perform some specific task for which we are not able to find any tools which suits
the requirements, this is when we have some basic programming knowledge so that
we can quickly create some code to perform the desired operation. This chapter will
touch upon the basics of Python programming language. We will understand why
and how to use Python, what are the basic entities and then we will move on to cre¬
ate some simple but useful code snippets. It is advised to have some programming
knowledge before moving on with this chapter as we will be covering the basic essen¬
tials related to the language and jump straight into the code. Though the examples
used would be simple yet having some programming experience would be helpful.
Anyone who has some interest in computer science is familiar with the concept of
programming. In simple terms it is the process of creating a program to solve a prob¬
lem. To create this program we require to have a language using which we can write
instructions for computer to understand and perform the task. The simple objective
of a computer program is to automate a series of instructions so that they need not to
be provided one by one manually.
PROGRAMMING VERSUS SCRIPTING
The language we are going to be discussing in this chapter is Python, which is commonly
termed as a scripting language, so before moving further let’s understand what that
means. Usually the code written in a programming language is compiled to machine code
using a program called compiler to make it executable. For example, code written in C++
language is compiled to create an exe file which can be executed in a Windows platform.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-8018fi7-5.ft0013-6 229
Copyright © 2015 Elsevier Inc. All rights reserved.
Copyrighted material
230 CHAPTER 13 Quick and Dirty Python
There is another program called as an interpreter which allows running a language code
without being compiled. So if the execution environment for a piece of code is an inter¬
preter it is a script. Usually Python is executed in such environment and hence is com¬
monly called a scripting language. This does not mean that a scripting language cannot
be compiled, it simply is not usual. All scripting languages are programming languages.
INTRODUCTION TO PYTHON
Python is a high-level programming language created by Guido Van Rossum, which
emphases on the readability of code. Python is very fast and allows solving problem
with minimum amount of code and hence is very popular among people who need
to create quick scripts on the go, such as pentesters. There are various versions of
Python but we will be focusing on the 2.7 version in this chapter. Though the latest
version as of now is 3.4, yet most of the Python tools and libraries available online
are based on the 2.7 version and the 3.x version is not backward compatible and
hence we will not be using it. There are some changes in 3.x version but once we get
comfortable with 2.7 it won’t require much effort to move to it, if required.
The main agenda behind this chapter is not to create a course on Python that
would require a separate book in itself. Here we will be covering the basics quickly
and then move on to creating small and useful scripts for general requirements. The
aim is to understand Python, write quick snippets, customize existing tools, and cre¬
ate own tools as per requirements. This chapter strives to introduce the possibili¬
ties of creating efficient programs in a limited period of time, provide the means to
achieve it, and then further extend it as required.
There are other alternatives to Python available, mainly Ruby and Perl. Perl is one
of the oldest scripting languages and Ruby is being widely used for web develop¬
ment (Ruby on Rails) yet Python is one of the easiest and simplest language when
it comes to rapidly creating something with efficiency. Python is also being used for
web development (Django).
INSTALLATION
Installing Python in Windows is pretty straight forward, simply download the 2.7
version from https://www.python.org/downloads/ and go forward with the installer.
Linux and other similar environments mostly come preinstalled with Python.
Though, it is not mandatory yet highly recommended to install Setuptools and Pip
for easy installation and management of Python packages. Details related to Setuptools
and Pip can be found at https://pypi.python.org/pypi/setuptools and https://pypi.python.
org/pypi/pip respectively.
MODES
We can run Python basically in two ways, one is to directly interact with the interpreter,
where we provide the commands through direct interaction and see the output of it (if
any) and other one is through scripts, where we write the code into a file, save it as
Copyrighted material
232 CHAPTER 13 Quick and Dirty Python
IDENTIFIERS
In programming, identifiers are the names used to identify any variable, function,
class, and other similar objects used in a program. In Python, they can start with
an alphabet or an underscore followed by alphabets, digits, and underscore. They
can contain a single character also. So we can create identifiers accordingly, except
certain words which are reserved for special purposes, for example, “for,” “if,” “try,”
etc. Python is also case sensitive which means “test” and “Test” are different.
DATA TYPES
Python has different variable types, but is decided by the value passed to it and does
not require to be stated explicitly. Actually the data type is not associated with the
variable name but the value object and the variable simply references to it. So a vari¬
able can be assigned to another data type after it already refers to a different data
type.
C:\Python27>python
Python 2.7.3 <default, Apr 10 2012, 23:24:47> [MSC v.1500 64 bit <AMD64>] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> test=10
>>> test
10
\»> test="This is a test"
>>> test
'This is a test'
»>
FIGURE 13.2
Value assignment.
Commonly used data types are:
• Numbers
• String
• Lists
• Tuples
• Dictionaries
To define a number simply assign a variable with a number value, for example,
>>>samplenum=10
Just to know there are various types of numerical such as float, long, etc.
To define a string we can use the help of quotes (both single and double), for
example,
>>>samplestr=”This is a string”
>>>samplestr2=’This is another string’
We can also utilize both the types of quotes in a nested form. To create multiline
strings we can use triple quotes.
Copyrighted material
PA236
236 CHAPTER 13 Quick and Dirty Python
single code. The examples shown in the following chapter will work on this concept
and we will be using spaces.
Basic terms (class, function, conditional statements, loops, etc.)
Now let’s move forward with conditional statements.
The most basic conditional statement is “if.” The logic is simple, if the provided
condition is it will execute the statement, else it will move on. Basic structure of “if’
and associated conditions is shown below.
if condition:
then_this_statement
elif condition:
then_this_statement
el se:
this_condition
Example code
#!/usr/bin/python
a-10
b-12
c-15
if (a==b):
print “a=b”
el if (b==c):
print “b=c”
el if (c==a):
print “c=a”
el se:
print “none”
Write this in a notepad file and save it as if_con.py. This code will result in the
response “none,” when executed in Python. The “elif” and “else” conditions are not
mandatory when using “if’ statement and we can have multiple “elif’ statements.
Similarly we can also have nested “if’ conditions where there will be if statements
within another if statement, just proper indentation needs to be kept in mind.
if condition:
then_this_statement
if nested_condition:
then_this_nested_statement
else nested-else_condition:
then_this_nested-else_statement
The “while” loop is next in line. Here we will provide the condition and the loop will
run until that condition is true . Structure of “while” is shown below.
while this_condition_statement_is- true:
run_this_statement
Copyrighted material
Introduction 237
Example code
#! /usr/bin/python
a=10
c=15
while (a<c):
print a
a=a+l
Output
10
11
12
13
14
Wc can also utilize “break” and “continue” statement to control the flow of the loop.
The “break” statement is used to break out of the current loop and the “continue”
statement is used to pass the control back to the starting of the loop. There is one
more interesting statement called “pass” which does nothing, in particular is used
just as a placeholder.
Another useful conditional statement is “for” loop. Using it we can iterate through
the items present within an object such as a tuple or list.
Example code
#! /usr/bin/python
sample_tup=(‘23 *,'test *,12, f w2')
for items in sample_tup:
print items
Output
123
test
12
w2
We are simply passing the individual values in the tuple sample_tup and putting them
inside the variable items one by one and printing them.
Example code
#! /usr/bi n/python
str=“Stri ng”
for items in str:
print items
Copyrighted material
240 CHAPTER 13 Quick and Dirty Python
Output
First Function
Argument Return
Second Function
Argument Return
Here the function_init_is the constructor of the class and is the first function
which runs in the class. The variable “classobj” is the object for the class “sample^
class” and using it we can communicate with the objects inside the class. As dis¬
cussed earlier we can also create this as a module and call it inside another program.
As discussed earlier, let’s take another example of importing modules.
Example code
#!/usr/bin/python
class sample_class:
def _init_(self, classarg):
self.cla=classarg
def firstfunc(sel f):
print “First Function”
return self.cla+“ Return”
def secfunc(sel f):
print “Second Function”
return self.cla+“ Return”
classobj=sample_class(“Argument”)
This file is being saved as mod.py and another file calls this as a module with the code:
//! /usr/bi n/python
from mod import *
print classobj.firstfunc()
Output
First Function
Argument
In Python we can also create directory of modules for better organization through
packages. They are hierarchical structures and can contain modules and subpackages.
WORKING WITH FILES
Sometimes there is a need to save or retrieve data from files for this we will learn how
to deal with files in Python.
First of all, to open a file we need to create an object for it using the function open
and provide the mode operation.
>>>sample_fi1e=open(‘text.txt *,“w”)
Here the name sample_file is the object and using open function we are opening the
file text.txt. If the file with this name does not already exists it will be created and if
already exists it will be overwritten. The last portion inside the parenthesis describes
Copyrighted material
Introduction 241
the mode, here it is w which means write mode. Some other commonly used modes
are “r” for reading, “a” for append, “r+” for both read and write without overwriting,
and “w+” for read and write with overwriting.
Now we have created an object so let’s go ahead and write some data to our file.
>>>sample_fi1e(“test data”)
Once we are done with writing data to the file we can simply close it.
>>>sample_fi1e.close()
Now to read a file we can do the following:
>>>sample_fi1e=open(‘text. txt ’,“r”)
>>>sample_fi1e.read()
‘test data’
>>>sample_fi1e.close()
Similarly we can also append data to files using “a” mode and writeQ function.
Python has various inbuilt as well as third party modules and packages which
are very useful. In case we encounter a specific problem that we need to solve using
Python code it is better to look for an existing module first. This saves a lot of time
figuring out the steps and writing huge amount of code through simply importing the
modules and utilizing the existing functions. Let’s check some of these.
Sys
As stated in its help file this module provides access to some objects used and main¬
tained by interpreter and functions that strongly interact with it.
To use it we import it into our program.
import sys
Some of the useful features provided by it are argv, stdin, stdout, version, exit(), etc.
Re
Many times we need to perform pattern matching to extract relevant data from a
large amount of it. This is when regular expressions are helpful. Python provides “re”
module to perform such operations.
import re
Os
The “os” module in Python allows to perform operating system-dependent
functionalities.
import os
Some sample usages are to create directories using mkdir function, rename a file
using rename function, kill a process using kill function and display list of entries in
a directory using listdir function.
Copyrighted material
PA243
Introduction
When executing this code, it will prompt the message “Enter something”, once we
input the value it will generate the response accordingly. For an input value “a” it will
generate the output “aaaa”.
COMMON MISTAKES
Some common issues faced during the execution of Python code are as follows.
Indentation
As shown in examples above. Python uses indentations for grouping the code. Some
people use spaces for this and some use tabs. When running the code written by some
person or modifying it we sometimes face the indentation error. To resolve this error,
check the code for proper indentation and correct the instances; also make sure to not
mess up by using tabs as well as spaces in the same code as it creates confusion for
the person looking at the code.
Libraries
Sometimes people have a completely correct code, yet it fails to execute with a library
error. The reason is missing of a library that is being called in the code. Though it
is a novice mistake, sometimes experienced people also don’t read the exact error
and start looking for errors in the code. The simple solution is to install the required
library.
Interpreter version
Sometimes the code is written for a specific version of the language and when being
executed in a different environment, it breaks. To correct this, install the required ver¬
sion and specify it in the code as shown earlier in this chapter or execute the code using
the specific interpreter. Sometimes there are multiple codes which require different ver¬
sions; to solve this problem we can use virtualenv, which allows us to create an isolated
virtual environment where we can include all the dependencies to run our code.
Permission
Sometimes the file permissions are not set properly to execute the code so make the
changes accordingly using chmod.
Quotes
When copying code from some resources such as documents and websites there is a
conversion between single quote (‘) and grave accent (') which causes errors. Iden¬
tify such conversions and make the changes to the code accordingly.
So we have covered basics about the language let’s see some examples which can
help us to understand the concepts and understand their practical usage and also get
introduced to some topics not discussed above.
Similar to shodan, discussed in a previous chapter there is another service
called zoomeye. In this example we will be creating a script using which will query
243
Copyrighted material
246 CHAPTER 13 Quick and Dirty Python
extracts data in the form of entity (or entities) based upon the relationship. Maltcgo
has a lot of inbuilt transforms and keeps on updating the framework with new ones,
but it also allows to create new ones and use them, this can be very helpful when we
need something custom according to our needs.
Before we move any further we need the “MaltegoTransform” Python library by
Andrew MacPherson, which is very helpful in local transform development. It can
be downloaded from the page https://www.paterva.com/web6/documentation/develo
per-local.php. Some basic examples of local transforms created using the library are
also present at the bottom of the page. Once we have the library in our directory we
are ready to go and create our own first transforms.
To create any program first we need to have a problem statement. Here we need
to create a transform so let’s first identify something that would be helpful during
our OSINT exercise. There is a service called as HavelBeenPwned (https://haveibee
npwned.com) created by Troy Hunt which allows users to check if their account has
been compromised in a breach. It also provides an application programming inter¬
face (API) using which we can perform the same function. We will be using the vl
of the API (https://haveibeenpwned.com/API/vl) and provide an e-mail address to
check if our supplied e-mail has any account associated.
To utilize the API we simply need to send a GET request to the service in the
form shown below and it will provide a JSON response to show the website names.
https://haveibeenpwned.com/api/breachedaccount/{ account}
Let’s first specify the path of the interpreter
//! /usr/bi n/python
Now we need to import the library MaltegoTransform
from MaltegoTransform import *
Once we have the main library we need to import some other libraries that will
be required. Library “sys” is to take user input and urllib2 to make the GET request.
import sys
import ur11ib2
Once we have imported all the required libraries, we need to assign the function
MaltegoTransform() to a variable and pass the user input (e-mail address) from
Maltego interface to it.
mt = MaitegoTransform()
mt.parseArgumentsC sys.argv)
Now we can pass the e-mail value to a variable so that we can use it to create the
URL required to send the GET request.
email=mt.getVal ue()
Let’s create a variable and save the base URL in it.
hibp=“ https : //haveibeenpwned . com/a pi/breachedaccount/”
Copyrighted material
248 CHAPTER 13 Quick and Dirty Python
::\Python27>emailhibp.py fooPbar.com
MaltegoMessage>
MaltegoTransf ornResponseMessage>
Entities>
Entity Type- M naltego.Phrase M >
Ualue>Pwned at [''Adobe”,"Gawker”,”Stratfor"K/Ualue>
We ight >100</We ight >
/Entity>
/Entities>
UIMessages>
/UIhessages>
/MaltegoTransfornResponseMessage>
/MaltegoMessage>
FIGURE 13.7
Transform output.
We can see that the response is a XML styled output and contains the string
“Pwned at [“Adobe”,“Gawker”,“Stratfor”]”. This means our code is working prop¬
erly and we can use this as a transform. Maltego takes this XML result and parses it
to create an output. Now our next step is to configure this as a transform in Maltego.
Under the manage tab go to Local Transform button to start the Local Transform
Setup Wizard. This wizard will help us to configure our transform and include it into
our Maltego instance.
In the Display name field provide the name for the transform and press tab, it will
generate a Transform ID automatically. Now write a small description for the trans¬
form in the Description field and the name of the Author in the Author field. Next we
have to select what would be the entity type that this transform takes as input, in this
case it would be Email Address. Once the input entity type is selected we can choose
the transform set under which our transform would appear which can also be none.
FIGURE 13.8
Transform setup wizard.
Copyrighted material
PA249
Introduction
Now click on next and move to the second phase of the wizard. Here under the
command field we need to provide the path to the programming environment we are
going to use to run the transform code. In our case it would be
/usr/bin/python (for Linux)
C:\Python27/python.exe (for Windows)
Once the environment is set we can move to the parameters field, here we will
provide the path to our transform script. For example,
/root/Desktop/transforms/emai1 hi bp.py (for Linux)
C:\Python27\transforms\emailhibp.py (for Windows)
One point to keep in mind here is that if we select the transform file using the
browse button provided in front of the “Parameters” field, then it will simply take
the file name in the field, but we need absolute path of the transform to execute it so
provide the path accordingly.
FIGURE 13.9
Transform setup wizard.
After all the information is filled into the place we simply need to finish the wiz¬
ard and our transform is ready to run. To verify this, simply take an e-mail address
entity and select the transform from the right click menu.
249
Copyrighted material
PA250
250 CHAPTER 13 Quick and Dirty Python
Penetration ...
^ Personal
ijj Run Transform »
All Transforms ►
Send to URL
0 Run Machine »
Helat cd Email addresses ►
To Domain [DNS] “ re
Document
Copy to New Graph ►
Other transforms ►
To Email Addresses [PGP (signed)]
** A doiufrwfil on ll
2 Change Type ►
All transforms
To Email Addresses (PGP) “ w»
Email Address
Merge
To Email Addresses [using Search Engine] w»
lo Person [IM,I>] “r.
^ Image
Attach
To Person [Parse separator] re
J A vi»u.J r«vi«.«r
To Phone number (using Search Engine) !“ we
. Person
Type Actions ►
To URLs (Show search engine results) O
* rapracantt
Copy
To Website [using Search Engine]
t Phone Numbei
Copy (as List)
Verify email address exists [SMTP) ~ 9
* A
Cut
dgfd
q Phrase
Delete
emailToTlickr Ac count re
U An. t..t o. part
emaifToMyspaceAccourit *7 iff
«■ Social Network
emallhibp
searchPastebinsT orEmail re
All transforms
qj DetaJ View
o
Email Address
miltfgtmtMam
foo@bar.com
Property view
9 Properties
FIGURE 13.10
Select transform.
FIGURE 13.11
Transform execution.
Now we have created our first transform and also learned how to configure it in
Maltego. Let’s create another simple transform. For this example we will be using
the website http://www.my-ip-neighbors.com/. It allows to perform a reverse IP
domain lookup, simply said the domains sharing the same IP address as the one of
the provided domain. As in the previous transform we provided an e-mail address as
the input here we require a domain name, but this website provides no API service
and hence we will have send the raw GET request and extract the domains out of the
web page using regular expressions through the library “re”.
Copyrighted material
Introduction 251
#!/usr/bin/python
from MaitegoTransform import *
import sys
import urllib2
import re
mt = MaitegoTransformC)
mt.parseArguments(sys.argv)
url=mt.getValue()
mt = MaitegoTransform()
opencnam="http : //www.my-i p-neighbors .com/?domai n=”
getrequrl=opencnam+url
header={‘User-Agent *:*Mozi11a’}
req=url1i bZ .Request(getrequrl .None,header)
response=url1ib2.urlopen(req)
domains=re.findall(“((?:[0-9]*[a-z][a-z\\.\\d\\-]+)\\.(?:
[0-9]*[a-z][a-z\\-] + ))(?![\\w\\.])”, response.readO)
for domain in domains:
mt.addEntity(“maltego.Domain”, domain)
mt.returnoutput()
♦http://txt2re.com/ can be used to create regular expressions.
Similarly we can create lot of transforms which utilize online services, local tools
(e.g., Nmap scan), and much more using Python. The examples shown above and
some more can be found at https://github.com/SudhanshuC/Maltego-Transforms.
Some other interesting transforms can be found at https://github.com/cmlh, else they
are just a quick Github search away (https://github.com/search7utf8s?? E29? 9C9? 93
&q=maltego+transform).
There is also a Python-based framework available, which allows creating Maltego
tranforms easily called as Canari (http://www.canariprojcct.com/).
There are various topics which we have not covered but the scope is limited and
topic is very vast. Some of these are exception handling, multiprocessing, and mul¬
tithreading. Below there are some resources which can be helpful in this quest of
learning Python.
RESOURCE
https://github.com/dloss/python-pentest-tools
A great resource to learn more about Python and its usage is the Python docs
itself https://docs.python.Org/2/. Another great list of Python-based tools with focus
on pentesting is present at https://github.com/dloss/python-pentest-tools. It would be
great to create something interesting and useful by modifying, combining, and add¬
ing to the mentioned resources. The list is divided into different sections based on the
functionality provided by the tool mentioned.
So we have covered some basics of Python language and also learned how to
extend Maltego framework through it. Through this chapter we have made an attempt
to learn about creating own custom tools and modify existing ones in a quick fashion.
Copyrighted material
252 CHAPTER 13 Quick and Dirty Python
This chapter is just an introduction of how we can simply create tools with minimum
amount of coding. There is certainly room for improvement in the snippets we have
shown in functional as well as structural terms, but our aim is to perform the task as
quickly as possible.
Though we have tried to cover as much ground as possible yet there is so much
more to learn when it comes to Python scripting. Python comes with a large set of
useful resources and is very powerful; and by using it one can create power tool-
set, recon-ng (https://bitbucket.org/LaNMawSteR53/recon-ng) is great example of it.
We have discussed about this Reconnaissance framework in a previous chapter. One
great way to take this learning further would be to practice more and create such tools
which could be helpful for the community and contribute to the existing ones such
as recon-ng.
Slowly we are moving toward the end of this journey of learning. We have been
through different aspects of intelligence gathering in different manners. Moving on
we will be learning about some examples and scenarios related to our endeavor,
where we can utilize the knowledge we have gained in a combined form.
Copyrighted material
PA253
CHAPTER
Case Studies and
Examples
INFORMATION IN THIS CHAPTER
• Introduction
• Case studies
• Example scenarios
• Maltego machines
INTRODUCTION
After working with so many tools and techniques and going through so many pro¬
cesses of information gathering and analysis, now it’s time to see some scenarios and
examples where all this comes together for practical usage. In this chapter we will
include some real scenarios in which we or people we know have used OSINT (open
source intelligence) to collect the required information from very limited informa¬
tion. So without wasting any time let’s directly jump into case study 1.
CASE STUDIES
CASE STUDY 1: THE BLACKHAT MASHUP
One of our friends returned from Black Hat US conference and he was very happy
about the meetings and all. Our friend works for a leading security company and
takes care of US sales. He was very excited about a particular lead he got there. The
person he met there was in senior position of a gaming company and interested in the
services offered by our friend’s company. They had a very good networking session
in the lounge while having drinks and in excitement he forgot to exchange the cards.
So he had to find the person and send him the proposal that he committed.
• Problem No. 1: He forgot his full name but remembers his company name and
location.
• Problem No. 2: While discussion the other person said that as many people
approach him for such proposals, he uses a different name on Linkedln.
• Problem No. 3: We know the position of the other person, but it is not a unique
position such as CEO or CTO.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-8ni8fi7-5.ft0ni4-8
Copyright © 2015 Elsevier Inc. All rights reserved.
253
Copyrighted material
254 CHAPTER 14 Case Studies and Examples
When he came to us with this case we had a gut feeling that we can find him.
We have some sort of information about the person though that does not include any
primary information such as e-mail address or full name.
These are the steps we followed.
The first thing we asked him was that whether he can recognize the person’s picture
or he forgot that also, to our good luck he said “yes.” So the major point in our case was
that if we will find some profiles of persons he can validate and confirm the right person.
Step 1:
As usual we first started with a simple Google query with the first name, his posi¬
tion, and company name. Let’s say his position is senior manager and company name
is abc.inc. The query we used is
Senior manager abc.inc
Step 2:
We tried the same in Facebook to get a profile equivalent to that but never the less
no leads.
Step 3:
We went to Linkedln tried that simply and failed.
Step 4:
We went to that company profile page and tried to visit all the employees’ profiles
but we found that there are more than 7000 registered employees therein Linkedln
and it’s really a tough task to find anyone from there.
Step 5:
As we covered in Chapter 2 Linkedln provides an advanced search feature and we
have some of the direct data for the fields. So we have decided to use that.
https://www.linkedin.com/vsearch/p?trk=advsrch&adv=true
We filled data in the fields such as title, company, and location as our friend has
these information. In result we got many equivalent profiles but this time we got
far less results which we went through one by one and finally found the person in
twenty-first result as he had shared that he has recently been to the conference. After
visiting his profile we got a bit more details about that person and our friend con¬
firmed that we got what we were looking for.
What could have been done after this?
We might get his primary e-mail id, company e-mail id, and other details using
different sources such as Maltego or simple Google. Using the image we might go
for a reverse image search to get related image and the sources. We might get the
blogs or websites created by that person and many more. There were endless pos¬
sibilities but we stopped there because for the time being that was out of our scope.
We sent the link to our friend, who then sent him connection request and later got the
deal and so we got a small treat.
Copyrighted material
258 CHAPTER 14 Case Studies and Examples
Step 1:
As we had the name of the person and company name we directly searched the
person in Linkedln. We found his profile and the profile consists of many information
such as his current and previous work experience. We found that the person is one of
the technical leads of that company. The Linkedln profile also consists of some of his
articles and latest achievements. The person recently got OSCP (Offensive Security
Certified Professional). We also found the GitHub account link from the Linkedln
profile. We visited each of his articles, and most of the articles were about how he
found some of the bugs in many major sites which consist of some of the zero-days
in popular CMS system.
Step 2:
After getting these we visited his GitHub account. He wrote all his scripts to
automate the testing process in Python.
Step 3:
We did a simple Google search on his name and got many links along with a
slidcshare account. We visited that slidesharc account. There were some presenta¬
tions on how to write your own IDS rules. In one of the older posts we found a com¬
ment link to his older blogs.
Step 4:
We visited that old blog of him which consists of different road trips he did with
his bullet motorcycle.
Step 5:
We searched his Twitter account and found an interesting post that he was recently
attending one of the popular security conferences in Goa, India.
Step 6:
We visited the conference site and found that the person Mr. John Doe had given
a talk on network monitoring.
Step 7:
We searched him in Facebook and got information about his hometown, current
location, educational details, and all.
Step 8:
A quick people search on Yasni provided us a link to another website of a
local security community where we found his phone number as he was the chap¬
ter leader. We verified this phone number through Truecaller and it checked out
right.
Copyrighted material
Case studies 259
It almost took 25-30 min to which we stopped digging more. In the mean¬
while our friend was ready with all the information he got from the company
website. Based on the information we collected in different sources we con¬
cluded this.
• Save the phone number and greet him with his name in case he calls.
• First thing you need to ask him is that how his talk went in Goa.
• Read his talk abstract and tell him it was great and you are regretting that you
missed the talk.
• Expect questions on IDS rule writing and can refer to the slideshare presenta¬
tions for answer.
• Expect some questions on tool automation and that too in Python. So a quick
Python revision was required.
• Expect some questions on network penetration testing as he recently did
OSCP.
• Expect some questions on web application security, bug bounties, and zero-days
as he got listed in many.
• If he asks you about hobbies, tell him your road trips and how you wanted to
have a bullet motorcycle but haven’t got it yet.
• If you get any question related to your vision and all tell him something
related to the company’s existing vision aligned with your personal
thoughts.
• If he asks you where you see yourself after some years or future plans, tell him
you want to go for OSCP certification and be a Red team leader. This was true
anyway.
Our friend had a very good knowledge and experience in pentesting and
because of his expertise and with a little homework on the company and on the
background of the person, he got selected. Mr. John Doe was not only happy with
his technical skills but also for the reason that he and our friend got many things
in common.
So these were some interesting case studies, we have certainly added as well as
subtracted some points here and there as required but all in all these are the kind of
situations everyone faces. Let’s learn about some basic types of information related
to commonly encountered entities and how to deal with them.
It’s quite easy to start with primary information such as name, e-mail id to collect
all other information but there are cases where we might not have primary informa¬
tion but that does not mean we cannot get these primary information from secondary
information. The process might be a bit difficult but it’s possible. So now we will
discuss in particular about a person’s details. What can be collected about a person?
Where and how?
Below are some of the information we might be interested to collect about a
person.
Copyrighted material
260 CHAPTER 14 Case Studies and Examples
PERSON:
• First Name
• Last Name
• Company Name
• E-mail Address (Personal)
• E-mail Address (Company)
• Phone Number (Personal)
• Phone Number (Company)
• Address (Home)
• Address (Company)
• Facebook Account URL
• Linkedln Account URL
• Twitter Account URL
• Flickr Account URL
• Personal Blog/Website URL
• Keywords
• Miscellaneous
From above list we can start with any point and can gather most of the rest.
The steps may differ from what wc got as a source and how to get to others one
by one but we will be using the same tools/techniques just in different order.
Whatever we take as a source, basically we need to start with simple Google
search or any other popular traditional search engine search such as Yandex. If we get
any relative information, use the same information to collect any of the other related
information by treating it as the source.
Let’s say we got simple name that contains first name and last name then we can
simply use a Google query to get results. Let’s say using Google we were able to get
the personal blog or website.
Visit that site to search related information such as any details about the person it
may be the area of interest, age, date of birth, e-mail, hometown, educational details,
or any such information that can be used to get other details.
Let’s say we got the educational details. Open Facebook and try to search for
the name with educational details. We may get the person’s profile. In Facebook we
will get lots of information such as company he/she is working on, friends, pictures
of him/her, and sometime other profile links along with the personal e-mail address
also.
Now using the company name and person’s name we can get the Linkedln profile
quite easily and can craft an e-mail address. Generally companies use a typical pat¬
tern to create e-mail addresses. Let’s say the company name is abc.inc and the site is
www.abc.com, and it uses the pattern, first letter of the first name and the last name
without any spaces. So from the person’s name and company name we can easily
craft the e-mail address or we can use tool like harvester to harvest e-mail address
from the company domain name and after looking at all the e-mails we can easily
pick the e-mail associated with the person. In this way it is possible to get or collect
information through correlation.
Copyrighted material
262 CHAPTER 14 Case Studies and Examples
DOMAIN:
• Domain
• IP Address
• Name Server
• MX server
• Person
• Website
• Subdomains
• E-mail Samples
• Files
• Miscellaneous
So as we discussed earlier let’s take the domain as a primary entity and from that
we want to get all other information that is mentioned above. If we wanted to simply
get the IP address of that domain, we just need to run a simple ping command in com¬
mand prompt or terminal based on the operating system we use.
ping <domain name>
This command will execute and will provide the IP address of the domain.
For other domain-specific information, there are domain tools freely available
in the internet and from the Whois record, we will get different information such
as registered company name, name server details, registered e-mail ids, IP address,
location, and many more. Resources like w3dt.net can be very helpful here. Directly
using domain tool we can find lots of information about a domain, or else we can use
different domain specific Maltego transformations for the same.
We can also use harvester to collect subdomains, e-mail address etc., from a
domain name. From the e-mail addresses we can search for the profiles of the persons
in different social networking sites.
And to get subdomains and a particular file from the domain we can use search-
Diggity, Google, Knock (Python tool).
To get different subdomains we can use site operator or create a Python script
which will take subdomain names from our list and enumerates it along with the
domain provided.
site:domainname
To get a particular type of file from the domain with a keyword we can use filetype
or ext operator and can run below query,
site:domainname keyword filetype:ppt
So in this way we can get all the domain-specific information from different
sources.
So these were some case studies and examples in which OSINT can be collected
and be helpful in our personal and professional life.
As promised earlier in our next topic we will be learning about Maltego machines.
Copyrighted material
PA263
Case studies
MALTEGO MACHINES
We have covered various aspects of Maltego in previous chapters from understand¬
ing the interface to creating local transforms. As this chapter is more about combin¬
ing the knowledge we have covered till now, so related to Maltego we will learn
how to create Maltego machines. Although we have already defined what a Maltego
machine is, yet for quick recall it programmatically connected a set of transforms. It
allows us to take one entity type as input and move toward another type(s) which are
not directly connected to it, but through a sct/sequcnce of transforms. There arc some
inbuilt machines in Maltego such as Company Stalker which takes domain entity as
input and runs various transform in sequential fashion to get different types of infor¬
mation from it such as e-mail address, files etc.
Maltego “Company Stalker” Machine
To create our own machine we need to use Maltego Scripting Language (MSL).
The documentation for MSL is available as a PDF at http://www.paterva.
com/MSL.pdf. The documentation is clear and simple, and anyone having basic
programming skills can easily understand it. As all the terms and process are
clearly described we do not need to cover them again, so straight away jump to
create our own simple machine using local transforms we learned to create in a
previous chapter.
Creating a Maltego machine is pretty simple, first we need to go to the Machines
tab, under which we can find the New Machine option. Clicking on it will bring a
window where we need to provide the name and other descriptive details related
263
Copyrighted material
264 CHAPTER 14 Case Studies and Examples
to the machine we are going to create. In the next step we need to choose type of
machine we are going to create. For this we have three options:
• Macro: runs once
• Timer: runs periodically until stopped
• Blank: a blank template
FIGURE 14.2
Create Maltego Machine
Once we have selected the machine type we can write the code for our machine
and include transforms in it at the appropriate position, from the right-hand side
panel by selecting the transform and double clicking on it. The “start” block contains
the transforms and all other execution part. The “run” functions are used to execute a
specific transform. To run functions in a parallel fashion we can include them inside
the “paths.” Inside “paths” we can create different “path” which will run in parallel
with each other but the operations inside a path will run sequentially. Similarly we
can provide different values, take user inputs, use filters etc.
Let’s create a simple machine which extracts e-mail ids from a provided domain
and further runs our H1BP local transform on these. For this we need to provide
the machine name and select the macro type machine. Next we need to include the
inbuilt transforms which can extract e-mails from domain such as domain to e-mail
using search engine, Whois etc. Next we need to include our local HIBP transform.
As we need to run these in parallel we need to create separate “path” for each e-mail
extraction transform. Our final code looks like this:
machine(“sudhanshuchauhan.domaintoHIBP”,
displayName:“domaintoHIBP”,
author:“Sudhanshu”,
Copyrighted material
PA265
Case studies
description: “Domain name to HavelBeenPwned”) {
start {
paths{
path!
run(“paterva.v2.Domai nToEmai1Address_AtDomain_SE”)
run(“sudhanshuchauhan.emai1 hi bp”)
)
path {
run(“paterva.v2.DomainToEmai1Address_SE”)
run(“sudhanshuchauhan.emai1 hi bp”)
I
path {
run(“paterva.v2.DomainToEmai1 Add ress_Whoi s”)
run(“sudhanshuchauhan.emai1 hi bp”)
}
path {
run( “paterva.v2.DomainToEmai1Address_PGP”)
run(“sudhanshuchauhan.emai1 hi bp”)
)
}
I
FIGURE 14.3
Our Maltego machine output.
Two important things that need to be kept in mind are that our local transform must
be integrated into Maltego before creating the machine and the input and output data
types need to be taken care of when creating a sequence.
265
Copyrighted material
266 CHAPTER 14 Case Studies and Examples
So we learned to create Maltego machine. Though there is still much more to
explore and learn related to Maltego, we have attempted to touch upon its every
important aspect.
In this chapter we have learned about combining all the knowledge we gained
till now and also saw some practical scenarios and examples. This is important as
in real-life projects. It’s not just about knowing things but also about implementing
and utilizing them in an integrated manner according to the situation and generating
a fruitful outcome.
In our next and last chapter we will be learning about certain general topics
related to the internet which are often connected directly or indirectly to the infor¬
mation gathering. Having a basic understanding of these terms will be helpful for
anyone utilizing internet for investigative purpose.
Copyrighted material
PA267
Related Topics of Interest
15
INFORMATION IN THIS CHAPTER
• Introduction
• Cryptography
• Data recovery
• IRC
• Bitcoin
INTRODUCTION
In previous chapters we have learned about various topics which are associated to col¬
lecting and making sense out of data. We learned about social media, search engines,
metadata, dark web, and much more. In this last chapter we will cover some topics
briefly which are not directly related to open source intelligence (OSINT) but to the
computing and internet culture and its evolution. If you practice the information
provided in previous chapters it is very likely to encounter these topics somewhere.
CRYPTOGRAPHY
There has always been a need to transfer messages from one location to another. Earlier
people used to send messages through messengers who used to travel long distances to
deliver them. Slowly a need to make this transmission secure came up. In situations like
war, the message being intercepted by the enemy could have changed the whole situa¬
tion. To tackle such scenarios people started to invent techniques to conceal the original
message, so that even if the message is intercepted it cannot be understood by anyone
except the desired receiver. One of the simplest examples is Caesar cipher, in which each
letter is replaced by another with a fixed alphabet position difference, so if the position
difference is 4 (right), then A would become D, B would become E, and so on. In modem
era, technology has advanced a lot and so has the techniques to encrypt as well as break it.
BASIC TYPES
Symmetric key
In this type of cryptography both the parties (sender and receiver) use same key
to encrypt and decrypt the message. A popular symmetric key algorithm is Data
Encryption Standard (DES), there are also its modern variants such as Triple DES.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-8018fi7-5.00015-X
Copyright © 2015 Elsevier Inc. All rights reserved.
267
Copyrighted material
268 CHAPTER 15 Related Topics of Interest
Asymmetric key
In this type, there are two keys, public and private. As the name suggests the public
key is openly distributed but the private key remains secret. The public key is used
to encrypt the message whereas only private key can decrypt it. This solved a major
issue with symmetric key which was the need of multiple keys for communication
with different parties. RSA is a good example of asymmetric key algorithm.
Some other associated terms:
Hashing
In simpler terms hashing is converting a character string into a fixed size value. Usu¬
ally the hash is of small length. Some commonly used hashing algorithms are MD5,
SHA1 etc.
Encoding
It is simply about converting a character into another form for the purpose of data
transmission, storage etc. It is simply like translating a language into another so
that the other party can understand it. Commonly used encodings are UTF-8,
US-ASCII etc.
The basic difference between these is that encrypted text requires a key to be con¬
verted back to plain text and it is mainly used for the confidentiality of message. In
hashing, the hashed text cannot be reversed back to the original text and it is mainly
used for integrity check and validation. The encoded text can be decoded back with
any key.
We came across different examples, cases, scenarios where we learned how data
or information plays a vital role in this digital world. Similarly any digital data stored
in devices such as computer, laptop, mobile device etc., are equally important. As
these are the personal devices, it consists of more personal data so should be taken
care of carefully. Any hardware issue, software malfunction, device crash or theft
lead to either loosing of those important data or can be in wrong hands and the con¬
sequences are much worst. So storing any important data in digital form requires a
meaningful effort to make that secure. There are many solutions available both open
source as well as commercial to store the data securely in these devices. Choose
any of those based on the level of confidentiality of data. Apart from storing the
data securely and locally in any device there are other cloud solutions available to
store our data in one place so that we can retrieve and use those as per our desire.
Along with the data storage and data transmission it is also recommended to use
secured backup from time to time to avoid any accidental loss of data. The solutions
are tightly based on what we learned above and that is cryptography or encryption.
Today we frequently use cryptography on daily basis through technologies such as
SSL/TLS, PGP, digital signature, disk encryption etc. So here we can conclude that
encryption plays a vital role in our day to day life to secure our digital or virtual life.
With increase in computation power the ability to crack encrypted messages
have also evolved. Attacks such as Brute-force, dictionary attack are easy to perform
at a high speed. Also there are weaknesses in the algorithms, which make it easy
Copyrighted material
PA269
Data recovery/shredding
to perform cryptanalysis on them. Given enough time and computation power any
encrypted text can be decrypted, so today the algorithms used attempt to make it
so time consuming to that the decrypted text becomes worthless in the time used to
crack it.
DATA RECOVERY/SHREDDING
Due to technological advancement now a days we prefer to store almost everything
in digital form. A person who needs to send his/her documents does not want to
visit a photocopy shop. He/she just wants to scan the hard copy for once and use the
same soft copy number of time. This is just a simple example to understand human
behavior now a days. So storing of important data in soft copy or in digital form
arises some of the security risks. As we discussed above, the damage of the device,
accidental delete can lead to loss of our important data. We just learned some precau¬
tions or in simple what to do with the digital data. But what if it got deleted?
There are possible ways to recover it. For a naive user, data recovery is only pos¬
sible when the data is still present in trash or recycle bin, but it’s not so. The capabil¬
ity of data recovery is way beyond that. This is just because of the very nature of data
storage and deletion function implemented by the operating system. To understand
this we must understand the basic fundamental of data storage or how data getting
stored in different storage devices.
There arc different types of storage devices such as tape drives, magnetic stor¬
age devices, optical storage devices, and chips. Tape drives are not generally used
for personal use, earlier it was an integral part of the enterprise storage system, now
there is a possibility that it is being deprecated so let’s not talk about that. Apart from
tape drive the other three are widely used. Magnetic devices are nothing but the hard
disk devices we use, popularly known as HDD or hard disk drive, which stores all
the data. When we delete data from our system the operating system does not delete
the data from the magnetic disk but it just remove the address reference to that part,
from the address table. Though the concept will be quite the same for all other media
types such as DVD, but as we use these storage devices for backup and the HDDs
for general storage, we will focus on HDD only. As we discussed that deleting a data
from the system means removing its memory location details from the address table.
So what is an address table and how it works? It’s quite simple. Generally when we
store the data in device it takes some memory from the HDD. The starting memory
location and the ending memory location define a data in hard disk. All these mem¬
ory location details are stored in a table called address table. So when we search for a
particular data the system checks the address table to get the memory locations allo¬
cated for that. Once it gets that memory location it retrieves the data for us. As after
deleting the data, the data is still present in the hard disk; we can recover it, unless it
is overwritten by other data. Here deleting means deleting data from system as well
as trash or recycle bin. So now we have some idea why data can be recovered after
deletion but the major question still stands is how? Let’s take a look into that as well.
269
Copyrighted material
PA271
Internet Relay Chat
^ File Shredder v2.5
f' GT©®
4T
f] Exit
File Shredder
4> Add Fie(s)
4> Add Folder
* Remove Selected
& Remove Al
Disks *:)
Shred Free Disk Space
Options
O' Shredder Settings
L3 Updates
Help
^ Online Help
fa Donate
^ Related Software
About
Type
Path
£jj TM_Dialling_No.xls
Mxrosoft Excel... D:\
Shred Files Now.
Visit Fie Shredder Online www.fileshredder.org
FIGURE 15.2
Data shredding using FileShredder.
INTERNET RELAY CHAT
IRC or the Internet Relay Chat is like old school for many. It was developed by
Jarkko Oikarinen in late 1980s. Though it was developed two decade back but the
popularity of this application is still there. People still love to use IRC. The statistical
data says it lost half of its users in last decade but as per a product of this old if still
people arc willing to use it’s a great achievement.
IRC is quite same as any other chat application. It follows client server archi¬
tecture and uses TCP protocol for communication. Earlier it uses plain text com¬
munication but now it also supports TLS or Transport Layer Security for encrypted
communication. The major reason for its development was to use it as a group chat
software and it serves the purpose quite well. As in general term we say different
types of chat groups as chat room. In IRC terms it is called as channel. Unlike other
chat clients it does not force a user to register but a user has to provide a nickname to
start chatting. A user can chat in channel as well as directly with another user using
private message option. IRC is widely used in different discussion forums and we
love to use this whenever we get an opportunity to use.
Normally to use IRC we need to install an IRC client in our system. There many
clients available in internet and for all kinds of operating systems. So download a
client which supports the operating system being used. Once we get an IRC client
installed, connect to a channel to start communicating with fellow channel member.
271
Copyrighted material
272 CHAPTER 15 Related Topics of Interest
The chat process is also quite same as the normal chat process. It’s basically line-
based chat. One user will send a message in a line then the other will reply. Due to its
anonymity most hackers prefer the use of IRC. The major question here is how it’s
going to help us in OSINT. It’s quite simple as there are various channels available we
can choose one based on our interest and crowdsource our question and get response
from different experts. We need to be in the right place in right time to discuss what
is happening in the cyber world. We can get clear scenario about what is happening
all over the world and, for example, if we are lucky enough then we might get future
prediction also such as which group is preparing for a distributed denial-of-service
(DDOS) attack on a company, what are the possible targets, what is the current attack
vectors hacktivists are using, and many more. The information we will get from here
can be used to define cyberspace, trends in cyberspace and future prediction, discuss
a query etc. So next time do not hesitate to use IRC, just provide a fancy name and
enjoy chatting. A simple web-based IRC platform is http://webchat.freenode.net/.
Simply enter a nickname and channel name, and start to explore.
FIGURE 15.3
Freenode IRC.
BITCOIN
Anyone into information security or keeping track of world media especially techni¬
cal journals must have heard of the term “bitcoin.” It was popular for its new concept
earlier in technical field but later when the value of 1 bitcoin touched almost 1000$, it
Copyrighted material
PA273
Bitcoin
started to trend between common internet users. Many must be aware of this but still
we will discuss some of the important facts about bitcoin. Bitcoin can be referred as
electronic currency or digital cash developed by Satoshi Nakamoto. Unlike normal
currencies it uses a decentralized concept called peer-to-peer technology for transac¬
tions. It is based on an open source cryptographic protocol in a format of SHA-256
hash in hexadecimal form. The smaller unit of a bitcoin is called as satoshis. 100
million satoshis at a time creates one bitcoin. Bitcoin can also be referred as pay¬
ment system as there are no banks, organization, or individual has power to control
or influence it. It’s always in digital form and can be transferred within a click to any
individual over the world. There are pros as well as cons for this also. Some of the
pros are we can convert bitcoin into any currencies independent of country. We can
transit it anonymously, hence it is quite popular in darknet. No one can fake, create,
or devalue bitcoins. Similarly there a large amount of cons also such as a transaction
cannot be reversed. The security of bitcoin is low as it always there in digital form.
Once a bitcoin wallet is deleted it is lost forever.
Now we have a bit understanding on bitcoins. So it’s important to know how to
store it also. We can store bitcoin digitally only, because it’s a digital data. We need a
bitcoin wallet to store bitcoin. The major disadvantage of this is once accidentally we
delete our wallet, we lose all the money. So take backups in proper intervals to avoid
any such incident. The initial bitcoin project site is http://www.bitcoin.org/.
FIGURE 15.4
Bitcoin wallet.
273
Copyrighted material
PA275
Index
Note : Page numbers followed by “f * and “b” indicate figures and boxes respectively.
A
Academic sites, J_8
Addictomatic, 22
Addons
buildwith. 48. 49f
chat notification, A1
Contact monkey, 52
follow.net. 49
onetab, 50-51
Project Naptha, .31
Reveye, 51
Riffle, 49-50, 50f
salesloft,31
for security. 21 1
HTTPS Everywhere, 212
NoScript, 212
WOT. 21 1.21 If
shodan. 48
Tineye, 31
wappalvzer. 48
whoworks.at. 50
YouTubc,32
Adobe Photoshop, 138-139
Advanced search techniques. 25
Faccbook, 25-26, 27f
Linkedln, 27-30, 27f
site operator. 31
Twitter, 30, 3If
Anonymity, 111
Anonymous network, 164
I2P, 165-168, 166f
Onion Router, 164-165, 165f
Antiviruses. 209
Application-based proxy
JonDo, 153-156, 154f-155f
Ultrasurf, 152-153, 153f
Application programming
interface (API). 246
Autocomplcte, 37-38
B
Betweenness, SNA
bridges. 225
defined. 222
factors, 222, 222f
gatekeeper/boundary spanners, 225
isolate, 225-227, 225f-226f
Liaison, 225
network reach, 223-224
nodes, role, 223
star/hub, 224
Bing
features, 33
operators
/,_83
“”,33
0,-86
&,36
+, 85-87
feed, 87, 87f
filetype.36
ip, 86, 87f
site, 36
Bitcoin, 272-274, 273f
Bill ocker, 215, 2 15f
Black Hat mashup, 253-254
Boardreader, 23
Bookmark, 32
Browser, 32
architecture, 34f
browser engine, 33
data persistence, 35-36
error tolerance, 36
javascript interpreter, 33
networking, 33
rendering engines, 33
threads, 36
UI backend, 33
user interface, 33
Chrome. 12
Epic browser. 30
features
autocomplete. 37-38
online and offline browsing, 36
private browsing, 36-37, 37f
proxy setup. 38
Firefox, 12-13
history of, 34
operations, 33-34
Buildwith, 48, 49f
Business/company search, 32
Glassdoor, 59-60, 60f
Linkedln. 59
Zoominfo,_60
275
Copyrighted material
276 Index
C
Carrot2, 72-73, 73f
CasePile. 194-196, 195f-196f
ChcckUscrnames, 61
Chromium, 32
Clearweb, 169-170
Contactmonkcy, 52
Content sharing websites, 17-18
Corporate websites, 17
Creepy
applying filter, 104, 104f
geolocation, 102
Plug-in Configuration button, 102, 102f
results, 103, 103f-I04f
search users, 102, 103f
Cryptography, 267
asymmetric key, 268
encoding, 268-269
hashing, 268
symmetric key, 267
Custom browsers. 46
categories, 45-46
Epic, 40
FireCAT, 43^44
HconSTF, 40-41
Mantra, 41-43, 42f
Oryon C,44.44f
TOR bundle, 45
Whitehat Aviator, 44-45
CybcrGhost, 161-162, 162f
D
Darknet, II
12P
create own site, 180-183, 180f-183f
download and install. 176
forum, JJJL 179f
git, ill. 177f
home. 176. 177f
ld3nt, 179, 179f
paste, 12S. 178f
Tor
DuckDuckGo, 123* 173f
files created, 175, 175f
HiddenServicePort, 174-175
Hidden Wiki, 1IZ 172f
Silk Road, UA
Torchan, 124. 174f
Tor hidden service, 175f. 176
Tor Wiki, 122. 173f
XAMPP,_LZ4
Darkweb, 170
Data encryption, 215-216, 215f
Data Encryption Standard (DES), 267
Data leakage protection (DLP), 145-146
Doc Scrubber, 146
geotags, 146
MAT, 115
MetaShield Protector. 145
M y DLP. 145
OpenDLP, 146
Data management/visualization
and analysis tools
CascFilc, 194-196, 195f-196f
excel sheet, 190-191, 19If
flowcharts, 192-193, 193f
KccpNotc, 197-198, 198f
Lumify, 198-199, 199f
MagicTree, 196-197, 197f
Maltego, 193-194, 194f
SQL databases, 191-192, 192f
Xmind, 199-201, 200f
data, 188
information, 188
intelligence, 188-190
DataMarket, 21
Data recovery /shredding, 269-270, 270f-271f
Deepweb. See also Darknet
advantages, 171
defined, 1Z0
di sad vantages, 171
Diggity Downloads, 143
Doc Scrubber, 146
Domain name system (DNS), 5-6, 8
DuckDuckGo, 62* 62f
E
E-mail, _5
Email-Rapportivc, 256
EmailShcrlock. 60. 61 f
Epic browser, 40
Error tolerance, 26
Excel sheet, 190-191, 19If
Exif Search, 136-137, 136f-137f
F
Facebook, 22, 25-26, 27f
Fingerprinting Organizations with
Collected Archives (FOCA),
139-140, 140f
FireCAT, 42^44
Follow.net, 49
Freedom, 40^41
Frcenct, 183-185, I83f-I84f
Copyrighted material
PA277
Index
G
Gecko, A5
Gephi
Data Laboratory tab. 219. 219f
installation, .218
Overview tab, 218-219, 219f
Preview tab. 220
Glassdoor, 59-60, 60f
Google
operators
-,-&l
..,81
M
*,£1
AND, j£l
allintext,
allinurl,22
AROUND, 80-81
cache, .82* 135f
calculator, .83.
convertor, 83-84
define, _8Q
ext,_8Q
filetype,_80
info. 82
intext, 79-80
intitle, _SQ
inurl. 79
NOT, _&1
OR, M
related, 82
site, 78-84, 79f
time, .82
weather, 82
search categories, 28
Google+, 24-25
Google Chrome, 38-39
Google Hacking Database, 83-84, 84f
Google Translate, 149
Government sites, 12
&
Ilachoir-metadata, 138-139
Hackerfox, 40^11
Hacking attempts, 208
Hard disk drive (HDD), 269
IIavelBeenPwned, 214, 214f, 256
Hello World program, 231, 23If
HconSTF, 40-41
Hideman, 163-164, 163f
HTTPS Everywhere, 212
HyperText Markup Language (HTML), 22
i
Id3nt. 179, 179f
ImageRaider, 70-71
Integrated database (IDBL 41
Intelligence
definition, 188-189
managing data, 189
structured data, 190
Internet
definition, 2
history, 2
working, 2
Internet Relay Chat (IRC), 271-272, 272f
Invisible Internet Project (I2P),
165-168, 166f
create own site, 180-183, 180f-183f
download and install. 176
forum, 128* 179f
git, J22, 177f
home, 12(2 177f
Id3nt, 179, 1791-
paste, J28* 178f
IP address, 3-4
iPhone, 137-138
IRC. See Internet Relay Chat (IRC)
ivMeta, 137-138, 138f
Ixquick,^!, 55f
J
Jeffrey’s Exif Viewer, 134-136, 135f
JonDo, 153-156, 154f-155f
installation. 154
interface. 154. 154f
running, 155, 155f
test, 154. 155f
Windows users. 153
JonDoFox,122
K
KeepNote, 197-198, 198f
Key logger, 206
Kngine. 63. 63f
KnowEm,iil
L
Linkedln.^i. 27-30, 27f, 258
LittleSis, 57-58
Lumify, 198-199, 199f
M
MAC address, _5
Magi cTree, 196-197, 197f
277
Copyrighted material
278 Index
Maltego
Collaboration, 127-128, 128f
commercial version. 124
community version. 124
domain to E-mail, 129-130, 130f
domain to website IP. 128-129, 129f
entity, _L24
Investigate, 126
machines, 125, 125f. 127
Company Stalker, 263, 263f
creating, 263-264, 264f
HIBP local transform, 264
MSL, 263
output, 265, 265f
Manage option, 126, I26f
Organize option, 126, 126f
person to website, 130-131, 13If
transform. 124
Maltego scripting language (MSL), 263
Maltego Transforms, 245-251, 248f-250f
Malwares
Key logger, 206
ransom wares, 206
restricted sites, 205-206
Trojan. 206
virus, 206
Mamma search engines, 55-56
Mantra, 41—43, 42f
Market Visual, ^8. 58f
Media access control address, .5
Metadata
creation of, 133-134
extraction tools
Exif Search. 136-137, 136f-137f
FOCA, 139-140, 140f
hachoir-metadata, 138-139
ivMeta, 137-138, 138f
Jeffrey’s Exif Viewer, 134-136, 1351
Mctagoofil, 140-142, 141f-142f
impact, 142-143
removal/DLP tools. 145
Doc Scrubber, 146
gcotags, 146
MAT, 445
MetaShield Protector, 145
MyDLP,_L45
OpenDLP, 146
Search Diggity, 143-145
Metadata anonymization toolkit (MAT), 145
Metagoofil, 140-142, 141f-142f
Meta search, .54
Ixquick, .55* 55f
Mamma, 55-56
Polymetn, 54 : 54f
MetaShield Protector, 145
Microsoft Baseline Security Analyzer (MBSA),
212, 213f
Mozilla Firefox, 38-39
MyDLP,445
N
Namechk. 61
NerdyData, 66-67, 67f
News sites. 17
NoScript, 212
0
Ohloh code, hi
Omgili, J3
Onetab, 50-51
.onion domain websites. 174
Onion Router, 164-165, 165f
Online anonymity
IP address, 147-148
proxy
Google Translate, 442*. 150f
page opened inside, 150, 151 f
types of. 15 1
whatismyipaddress, 149. 150f
VPN, 161
CyberGhost, 161-162, 162f
Ilideman, 163-164, 163f
Online scams/frauds, 207-208, 207f
Online security
addons. 211
HTTPS Everywhere, 212
NoScript, 212
WOT,2_LL 21 If
antiviruses. 209
data encryption, 215-216, 215f
hacking attempts, 208
jail broken iPhone, 204-205
malwares
Key logger, 206
ransom wares, 206
restricted sites, 205-206
Trojan, 206
virus, 206
operating system update, 210-211
password policy, 213-214, 214f
password reset, 204-205
phishing, 207, 210
scams and frauds, 207-208, 207f, 210
shoulder surfing, 208-209
Copyrighted material
PA279
Index
social engineering. 209. 215
spam message, 203-204
tools, 212-213, 213f
weak passwords, 208
OpenDLP, 146
Open source intelligence (OSINT)
academic sites, ±8
content sharing websites, 17-18
corporate websites. 17
demo, 255
government sites, 33
news sites. 17
public sources, 16b
search engines, 16-17
tools and techniques, 101
Creepy, 102-104
Maltego, 124—131
Recon-ng, 113-121
Search Diggity, 110-113
Shodan, 107-110
TheHarvester, 105-107
Yahoo Pipes, 121-124
WEBINT,36
weblogs/blogs, 18-19, 18f
Operating system
basic hardwares, _1H
Linux, _LL
Mac, 33
Windows, 10-11
Oryon C. 44. 44f
OSINT. See Open source intelligence (OSINT)
P
PeekYou,32
People search, .56
LittlcSis, 57-58
Market Visual, 38* 58f
Peek You, .51
Pipl, 56-57, 57f
Spokco,36
They Rule, 58-59
Yasni,32
Phishing, 207
Pipl, 56-57, 57f
Polymeta,33. 54f
Ports, 4
Pri me, 40-41
Private browsing, 36-37, 37f
Private IP address, 4
Programming language
Java, 11-12
Python, _L2
Project Naptha. 51
Protocol, 4-5
Proxy
application-based proxy
JonDo, 153-156, 154f-155f
Ultrasurf, 152-153, 153f
Google Translate, 332* 150f
page opened inside. 150, 15If
set up, 160-161, 160f
in Chrome, 38
in Firefox. 38
web-based proxy, 156
anonymouse.org, 156-158, 156f— 157f
Boomproxy.com, 159-160
FiltcrBypass, 159, I59f
Zcnd2, 158-159, 158f
whatismyipaddress,332* 150f
Public IP address, 4
Python. 230
classes, 239-240
common mistakes, 243-245, 244f
data types, 232-235, 232f-234f
functions, 239
Hello World program, 231, 23If
identifiers, 232
indentation, 235-238
installation. 230
Maltego Transforms, 245-251,
248f-250f
modes, 230-231
modules, 238-239
programming vs. scripting, 229-230
resource, 251-252
user input, 242-243
working with files
os, 241
re, 241
sys, 241
urllib2, 242
R
Ransom wares, 206
Raw browsers, 38-40
Recon-ng, 118f
commands, 114b, 115
installation. 1 14
Linkedln, 1 19
modules, 115, 115b, 116f-118f
penetration testing, 320
physical tracking, 119-120
PunkSpider in progress. 120. 12If
Rendering engines, 35
279
Copyrighted material
280 Index
Reverse image search. 69
Google images, JiL 701*
ImageRaider, 70-71
TinEye, JO.
Reverse username/e-mail search. 60
CheckUsernames, 61
EmailSherlock. 60. 61 f
Facebook, 01
KnowEm. 61
Namechk,_61
Reveye,01
Riffle, 49-50, 50f
Robots, JJ
Robtex,_68
S
Salesloft, 51
Search Diggity, 143-145
basic requirement. 1 10
interface. 110. 1 lOf
NotlnMy Backyard, _L12, 112f
scan-Bing tab. 111. 1 I2f
scan-Google tab. 111. 11 If
Shodan scan. 112. 113f
Search engine optimization (SEO), J&
Search engines, 53. See also specific search
engines
Secunia PSI, Jll
Semantic search
DuckDuckGo. 62. 62f
Kngine, 03*63f
Server, 1_
Shodan, 68-69, 69f
banners, 107
filters, 107, 108f
popular searches, 107, 108f
results for query ‘port: 21 country fin.”. 109. 109f
results for query‘‘webcam,”, 108-109, 109f
Shoulder surfing, 208-209
Silk Road, 11A
Small web format (SWF), M
SNA. See Social network analysis (SNA)
Social media intelligence (SOCMINT), Jll
Social media search, 63
SocialMenlion, 63-64, 64f
Social Searcher, 64-65
SocialMention, 63-64, 64f
Social network analysis (SNA)
edges, 2L8
betweenness, 222-227, 222f
directed edges, J2J_
ranking, 222
type, J21
undirected edges. 221
weight, 221
Gephi
Data Laboratory tab. 219. 219f
installation, 218
Overview tab, 218-219, 219f
Preview tab. 220
network. 218
nodes. 217. 220
Social network websites, 21 f
Facebook. 22
features, 21 b
Google+, 24-25
Linkcdln. 23
Twitter, 2A
Social Searcher, 64-65
SOCMINT. See Social media intelligence
(SOCMINT)
Source code search, 66-67
NerdyData, 66-67, 67f
Ohloh code. 67
Spiders, _LZ
Spokeo,J6
SQL databases, 191-192, 192f
Storage devices. 269
Surface web,_LZ
T
Tape drives. 269
Thellarvester
in action, 105, 105f
HTML results, Jili. 106f
sources, 106-107
TheyRule, 58-59
Tincye, JJL JO
Top level domains (TLDs), 5-6, 6b
Topsy,_6i, 65f
Tor, 164-165, 165f
DuckDuckGo. 173. I73f
files created, 175, 175f
HiddenServicePort, 174-175
Hidden Wiki,_LZZ 172f
Silk Road. 174
Torchan, _LZ±L 174f
Tor hidden service, 175f. 176
Tor Wiki, _LJZ 173f
XAMPP,_LZ4
TOR bundle, 4J
Torchan. 174 7 1 741*
Trendsmap, .66
Trojan, 206
Copyrighted material
PA281
Index
Truecaller, 74-75
Tweetbeep, .66
Twitter, 2±L_3(L 31 f
Topsy,_65, 65f
Trendsmap, 66
Tweetbeep, 66
Twiangulate,66
U
Ultrasurf. 152-153, 153f
Uniform resource locator (URL), 6-8
V
Virtualization, classifications, 7-8
Virtual private network (VPN), 161
CybcrGhost, 161-162, 162f
Hideman, 163-164, 163f
Virtual world. 19
Virus, 206
Vital Information Resources Under Seize, 206
w
Wappalyzcr,
WayBack Machine, 69.
W3dt,6S
Weak passwords, 208
WebJLfL 19-20
Web 3.0. 32
Web-based proxy, 156
anonymouse.org, 156-158. 156f— 157f
Boomproxy.com, 159-160
FilterBypass, 159, 159f
Zend2, 158-159, 158f
Web browser, 7^
WEBINT,_L6
WebKit, 35
Weblogs/blogs, 18-19, 18f
Web of trust (WOT).2JLL 21 If
Web search engine, 1_
Whitchat Aviator, 44^15
W hois, 68
Whoworks.at,50
Wise Data Recovery, 270, 270f
Wolfram Alpha, 71-72, 72f
World Wide Web (WWW)
vs. internet, 3^
media types, _3
X
Xmind. 199-201, 200f
Y
Yahoo
contents, 88
operators
88
+, 88-90
define, 89
intitlc, 89-90, 90f
link, 88-89, 89f
OR, 88
site, 88
Yahoo Pipes, 121-124, 123f
Yandex, .90
defined. 90
operators, 91
/,M
J, 92-93
!!,.93
93-94
0,i23*93f
*,_94
&, 91
&&, 91
+,90-99
~, 91
«,M
Cat, 98-99
date, 96-97
domain, 97
host, .96
inurl,.95
lang, 97-98
mimediletype. 95-96, 95f
/number, 91-92
rhost, .96
site, .96
title, 94-95
url,.95
Yasni,62
Z
Zend2, 158-159, 158f
Zoominfo.iiO
281
Copyrighted material